9

我正在解释我实际上希望做什么,以防有更高级别的建议完全消除了这个问题。

我有存储在三个数组中的科学数据:wave, flux, error。这些代表波长、通量和误差值。阵列大约有 4000 个元素长(阵列的索引号对应于探测器的像素号)。

我做了各种测试,但对于这个例子,假设我做了 2 个测试,我需要有效地屏蔽关联的数组。

masks = []
masks.append(wave > 5500.35)
masks.append(flux / wave > 8.5)

子问题:我可以轻松地做 2-mask 案例,例如:

fullmask = [x[0] and x[1] for x in zip(masks[0], masks[1])]

但是对于任意数量的口罩有什么办法呢?

真正的问题:有没有办法将所有掩码应用于每个数组(、、、wavefluxerror保留原始索引号?通过“保留原始索引号”,我的意思是原则上我可以获取掩蔽波阵列的平均像素数(原始索引号)?也就是说:如果wave[98:99]唯一的部分没有被屏蔽,平均像素将为 98.5。

元问题:这是做这些事情的最佳方式吗?


编辑

所以这里有一些示例数据可供使用。

wave = array([5000, 5001, 5002, 5003, 5004, 5005, 5006, 5007, 5008, 5009, 5010,
   5011, 5012, 5013, 5014, 5015, 5016, 5017, 5018, 5019, 5020, 5021,
   5022, 5023, 5024, 5025, 5026, 5027, 5028, 5029, 5030, 5031, 5032,
   5033, 5034, 5035, 5036, 5037, 5038, 5039, 5040, 5041, 5042, 5043,
   5044, 5045, 5046, 5047, 5048, 5049, 5050, 5051, 5052, 5053, 5054,
   5055, 5056, 5057, 5058, 5059, 5060, 5061, 5062, 5063, 5064, 5065,
   5066, 5067, 5068, 5069, 5070, 5071, 5072, 5073, 5074, 5075, 5076,
   5077, 5078, 5079, 5080, 5081, 5082, 5083, 5084, 5085, 5086, 5087,
   5088, 5089, 5090, 5091, 5092, 5093, 5094, 5095, 5096, 5097, 5098,
   5099])

flux = array([ 112.65878609,  109.2008992 ,  113.30629929,  117.17002715,
   103.19663878,  110.42131523,  106.00841123,  100.27882741,
   103.89160905,  102.29402469,  105.58894696,  103.21314852,
    96.97242814,  106.70130478,  108.83891225,  110.60598803,
    95.10361887,  109.39734257,  103.08289878,  104.97258911,
    96.46606257,  106.75993458,   99.25386914,  105.91429417,
   105.83752232,  100.53312657,   99.74871394,  107.12735837,
   108.81187473,   96.51418895,   99.71311101,   94.08702553,
    98.81198643,   93.84567201,  103.21444519,   94.7027134 ,
    99.61842203,  103.71336458,  100.8697998 ,   92.1564786 ,
    96.56711985,   94.7728761 ,   82.65194671,   83.52280884,
    86.57960844,   73.6700194 ,   66.11794666,   61.01624627,
    63.19944529,   55.50283247,   62.09172307,   59.55436092,
    75.66399466,   70.69397378,   64.27899192,   73.80248662,
    89.17119606,   78.97024327,   82.3334254 ,  100.82581489,
   102.77937201,   99.37717696,   96.2215563 ,  104.52291339,
    93.7581944 ,   93.32154346,  103.57018896,  108.08682518,
   105.2711359 ,  100.00242988,  100.86934866,  103.20764384,
   104.19274473,  101.3314802 ,  102.75057114,   94.02347591,
    95.48758551,  106.0099397 ,   99.50733501,   97.88110415,
   107.54266965,  107.76126331,   98.14882302,  101.55654606,
   101.02418212,  106.82324958,   95.52086925,  102.65957133,
   104.93806492,  103.22762427,  108.02087993,  106.71911141,
    97.24396195,  103.3450277 ,  113.99870588,  106.4145751 ,
   110.08294674,  109.40908288,  118.61518086,  114.37341062])

error = array([ 11.72799338,  22.33423611,  16.89347382,  12.80063102,
   23.99242356,  25.15863754,  20.44765811,  14.84358628,
   19.16343785,  19.5703491 ,  18.44427035,  19.08648083,
   19.09116433,  12.22098884,  14.81280352,  11.35010222,
   18.59850136,  15.78855734,  21.85877638,  20.12179042,
   22.04894395,  21.986731  ,  13.26738352,  16.10987762,
   24.28528627,  30.11866128,  25.30220842,  25.02100014,
   29.38560916,  16.8192307 ,  29.15097205,  23.56805267,
   15.17285709,  18.27495747,  18.63750452,  18.61618504,
   11.45940025,  21.95805701,  24.22923951,  11.76824052,
   19.75465065,  14.72979889,  15.45936176,  14.73227474,
   28.91683627,  22.90534472,  16.82376093,  21.47830226,
   20.05012214,  16.74393817,  17.79456361,  20.80008233,
   19.32059989,  23.23471888,  13.77434964,  17.56121752,
   15.96716163,  18.5294016 ,  28.31005939,  13.66340359,
   10.38160267,  16.09621015,  18.25125683,  20.95954331,
   21.31996941,  24.51998489,  16.58831953,  15.25427142,
   23.93065281,  30.4552266 ,  16.94527367,  16.92730802,
   17.79659417,  18.85080572,  18.0839428 ,  23.93949481,
   26.60243553,  13.68320208,  16.74669921,  20.30238694,
   12.74773905,  19.20810456,  20.7189417 ,  20.73402554,
   17.12106905,  25.06475175,  13.0947528 ,  28.16437938,
   22.4803386 ,  13.71143627,   6.60617725,  20.41186825,
   23.54924934,  22.25930658,  20.09337438,  24.94705884,
   18.58056249,   5.58653271,  18.71242702,  17.83578444])


# How I created masks, or just jump to next comment if it's too painful to look at...
masks = []
masks.append(flux/error > 4.0) # high error
absorptionMask1 = (wave < 5060)
absorptionMask2 = (wave > 5040)
bob = [all(x) for x in zip(absorptionMask1, absorptionMask2)]
absorptionMask = ~np.array(bob)
masks.append(absorptionMask) 

# The resulting mask
masks = [array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True, False, False,
       True, False,  True, False, False,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True, False,
      False, False, False, False, False, False, False, False, False,
       True,  True,  True,  True, False,  True,  True,  True,  True,
       True,  True, False,  True,  True,  True, False,  True,  True,
       True,  True,  True, False, False,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True, False,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool),
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True, False, False, False, False,
      False, False, False, False, False, False, False, False, False,
      False, False, False, False, False, False,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)]


# More in a bit, should get you a feel for what I'm looking at. 
4

5 回答 5

11

否则你可以使用布尔运算符,让我们定义一个例子:

d=np.arange(10)
masks = [d>5, d % 2 == 0, d<8]

您可以使用 reduce 来组合所有这些:

from functools import reduce

total_mask = reduce(np.logical_and, masks)

如果您需要手动选择掩码,也可以显式使用布尔运算符:

total_mask = masks[0] & masks[1] & masks[2]
于 2012-07-18T08:08:49.420 回答
7

我认为您正在寻找明星运营商:

fullmask = [all(mask) for mask in zip(*masks)]

...虽然我不确定我是否完全理解您的数据结构。

于 2012-07-18T07:35:06.180 回答
2

使用numpy 记录数组怎么样?

import numpy as np

# create some data
pixel = np.arange(4000)
wave = pixel / 4000. + 5500
flux = pixel / 4000. + 9.5 * 5500
data = np.rec.fromarrays((pixel, wave, flux), names='pixel, wave, flux')

mask = data.wave > 5500.25
mask &= data.flux / data.wave > 8.5

print data[mask].pixel.mean()
于 2012-07-18T08:13:15.740 回答
1

如果我理解正确,您想要的是过滤数组。

这是过滤数组的示例

your_array = [1, 5, 6000]    
filter(lambda elem: elem > 5000, your_array)

这将返回 [6000]

当您说“保留原始索引号”时,我认为您的意思是您要测试每个元素的条件并存储每个元素的结果?如果是这样,您可能需要使用地图

your_array = [1, 5, 6000]
map(lambda elem: elem > 5000, your_array)

这将返回 [False, False, True]

如果您有更复杂的条件,您可以用您定义的函数替换所有 lambda。

PS我认为如果你给出你想要的示例输入和示例输出会有所帮助。问题的措辞令人困惑。

编辑:

使用示例数据,我认为这就是您想要的,请随时发表评论。此方法可帮助您避免存储 True、False 列表,然后再查找所需元素的索引。它将返回一个索引列表,并允许您使用更少的步骤来计算平均值。

# Given wave, error, and flux the way you defined

# If wave is [21.2, 34.1, 43.423], then this returns [(0, 21.2), (1, 34.1), (2, 43.423)]
# Each element is now a tuple of (index, elem)
enum_wave = enumerate(wave)

# Returns a list of the indexes that pass the condition
# For example, if only 98, and 99 aren't filtered out, this will return [98, 99]
masked_wave = [index for index, elem in enum_wave if elem > 5060]

# To find the average
sum(masked_wave) / float(len(masked_wave))
于 2012-07-18T08:26:03.563 回答
0

您也可以这样做而不是使用functools.reduce

combined_mask = np.full(len(masks), True)
for mask in masks:
    combined_mask &= mask
于 2020-09-20T04:41:24.363 回答