1

我有一个numpy.ndarrays: x,y

>>> x = np.ndarray(shape=(10,), buffer=np.array([0.9902, 0.9394, 0.839,  0.8574, 0.9174, 0.8742, 0.8955, 0.9196, 0.9388, 0.9602]), dtype=float)

[0.9902 0.9394 0.839  0.8574 0.9174 0.8742 0.8955 0.9196 0.9388 0.9602]

>>> y = np.ndarray(shape=(10,), buffer=np.array([0.956, 0.884, 0.875, 0.880, 0.865, 0.870, 0.861, 0.817, 0.771, 0.727]), dtype=float)

[0.956, 0.884, 0.875, 0.880, 0.865, 0.870, 0.861, 0.817, 0.771, 0.727]

和系列edge_or_not

>>> d = {'2020-03-17 04:39:00+03:00': 0,
          '2020-03-17 04:40:00+03:00': 1,
          '2020-03-17 04:41:00+03:00': 0,
          '2020-03-17 04:42:00+03:00': -1,
          '2020-03-17 04:43:00+03:00': 0,
          '2020-03-17 04:44:00+03:00': 0,
          '2020-03-17 04:45:00+03:00': 1,
          '2020-03-17 04:46:00+03:00': -1,
          '2020-03-17 04:47:00+03:00': -1,
          '2020-03-17 04:48:00+03:00': -1}

>>> edge_or_not = pd.Series(data=d)

2020-03-17 04:39:00+03:00    0
2020-03-17 04:40:00+03:00    1
2020-03-17 04:41:00+03:00    0
2020-03-17 04:42:00+03:00   -1
2020-03-17 04:43:00+03:00    0
2020-03-17 04:44:00+03:00    0
2020-03-17 04:45:00+03:00    1
2020-03-17 04:46:00+03:00   -1
2020-03-17 04:47:00+03:00   -1
2020-03-17 04:48:00+03:00   -1
dtype: int64

我得到up_edge_x, up_edge_y, down_edge_x,down_edge_y像这样:

>>> up_edge_x = x[edge_or_not > 0]

array([0.9394, 0.8955])

>>> up_edge_y = y[edge_or_not > 0]

array([0.884, 0.861])

>>> down_edge_x = x[edge_or_not < 0]

array([0.8574, 0.9196, 0.9388, 0.9602])

>>> down_edge_y = y[edge_or_not < 0]

array([0.88 , 0.817, 0.771, 0.727])

all_edges_x, all_edges_y:

>>> all_edges_x = x[edge_or_not != 0]

array([0.9394, 0.8574, 0.8955, 0.9196, 0.9388, 0.9602])

>>> all_edges_y = y[edge_or_not != 0]

array([0.884, 0.88 , 0.861, 0.817, 0.771, 0.727])

然后创建 DataFrame:

>>> up_edge = pd.DataFrame({'y':up_edge_y}, index=up_edge_x)

            y   (pos)
0.9394  0.884       0
0.8955  0.861       1

>>> down_edge = pd.DataFrame({'y':down_edge_y}, index=down_edge_x)

            y   (pos)
0.8574  0.880       0
0.9196  0.817       1
0.9388  0.771       2
0.9602  0.727       3

我所需要的只是创建all_edges DataFrame3 列的位置:'y', 'edge','pos'

>>> all_edges = pd.DataFrame({'y':all_edges_y, 'edge':edge_or_not[edge_or_not != 0].to_numpy(), 
                         'pos':???},
                          index=all_edges_x)

所以毕竟all_edges DataFrame必须看起来像这样:

            y  edge  pos
0.9394  0.884     1    0
0.8574  0.880    -1    0
0.8955  0.861     1    1
0.9196  0.817    -1    1
0.9388  0.771    -1    2
0.9602  0.727    -1    3

如何计算第三列pos,我可以链接到all_edgesfromup_edgedown_edgeDataFrames,如下面的愚蠢示例:

>>> down_x1 = 0.9602
>>> loc = down_edge.index.get_loc(down_x1)
>>> edges = all_edges.loc[all_edges['pos']==loc]['edge']
>>> print(edges)

0.9602   -1
Name: edge, dtype: int64

我还有第二个问题:如何获取另一个 DataFrame 的位置数组?像这样:

>>> locations = down_edge.index.get_loc(#mb all indexes)

[0, 1, 2, 3]
4

2 回答 2

1

采用:

up_edge_x = x[edge_or_not > 0]
up_edge_y = y[edge_or_not > 0]

down_edge_x = x[edge_or_not < 0]
down_edge_y = y[edge_or_not < 0]

all_edges_x = x[edge_or_not != 0]
all_edges_y = y[edge_or_not != 0]

Series由索引首先创建的范围up_edge_x, down_edge_x

up_edge = pd.Series(range(len(up_edge_x)), index=up_edge_x, name='pos')
down_edge = pd.Series(range(len(down_edge_x)), index=down_edge_x, name='pos')
print (up_edge)
0.9394    0
0.8955    1
Name: pos, dtype: int64

print (down_edge)
0.8574    0
0.9196    1
0.9388    2
0.9602    3
Name: pos, dtype: int64

然后一起加入:

pos = pd.concat([up_edge, down_edge])
print (pos)
0.9394    0
0.8955    1
0.8574    0
0.9196    1
0.9388    2
0.9602    3
Name: pos, dtype: int64

最后映射新列:

all_edges = pd.DataFrame({'y':all_edges_y,
                          'edge':edge_or_not[edge_or_not != 0].to_numpy(), 
                          'pos': pd.Index(all_edges_x).map(pos)},
                          index=all_edges_x)


print (all_edges)
            y  edge  pos
0.9394  0.884     1    0
0.8574  0.880    -1    0
0.8955  0.861     1    1
0.9196  0.817    -1    1
0.9388  0.771    -1    2
0.9602  0.727    -1    3
于 2021-06-29T12:58:29.063 回答
0

我想我可能不会依附于,up_edgedown_edge只是按照以下方式进行:

>>> all_edges['pos'] = all_edges.groupby(all_edges['edge']).cumcount()

像这样预先创建一个 DataFrame all_edges

>>> all_edges = pd.DataFrame({'y':all_edges_y, 'edge':edge_or_not[edge_or_not != 0].to_numpy()}, index=all_edges_x)
于 2021-07-03T12:07:18.013 回答