0

我正在寻找一种方法来连接两个包含 numpy 数组的 python 字典中的值,同时避免手动遍历字典键。例如:

import numpy as np

# Create first dictionary
n = 5
s = np.random.randint(1,101,n)
r = np.random.rand(n)
d = {"r":r,"s":s}
print "d = ",d

# Create second dictionary
n = 2
s = np.random.randint(1,101,n)
r = np.random.rand(n)
t = np.array(["a","b"])
d2 = {"r":r,"s":s,"t":t}
print "d2 = ",d2

# Some operation to combine the two dictionaries...
d = SomeOperation(d,d2)

# Updated dictionary
print "d3 = ",d

给出输出

>> d =  {'s': array([75, 25, 88, 54, 82]), 'r': array([ 0.1021227 ,  0.99454874, 0.38680718,  0.98720877,  0.8662894 ])}
>> d2 =  {'s': array([78, 92]), 'r': array([ 0.27610587,  0.57037473]), 't': array(['a', 'b'], dtype='|S1')}
>> d3 =  {'s': array([75, 25, 88, 54, 82, 78, 92]), 'r': array([ 0.1021227 ,  0.99454874, 0.38680718,  0.98720877,  0.8662894, 0.27610587,  0.57037473]), 't': array(['a', 'b'], dtype='|S1')}

即,如果密钥已经存在,则存储在该密钥下的 numpy 数组被附加到。

有谁知道最好的方法来做到这一点,同时尽量减少使用缓慢的手动for循环?(我想避免循环,因为我想组合的字典可能有数百个键)。

谢谢!

4

1 回答 1

4

您可以为此使用熊猫:

from __future__ import print_function, division
import pandas as pd
import numpy as np

# Create first dictionary
n = 5
s = np.random.randint(1,101,n)
r = np.random.rand(n)
d = {"r":r,"s":s}
df = pd.DataFrame(d)
print(df)

# Create second dictionary
n = 2
s = np.random.randint(1,101,n)
r = np.random.rand(n)
t = np.array(["a","b"])
d2 = {"r":r,"s":s,"t":t}
df2 = pd.DataFrame(d2)
print(df2)

print(pd.concat([df, df2]))

输出:

          r   s
0  0.551402  49
1  0.620870  34
2  0.535525  52
3  0.920922  13
4  0.708109  48
          r   s  t
0  0.231480  43  a
1  0.492576  10  b
          r   s    t
0  0.551402  49  NaN
1  0.620870  34  NaN
2  0.535525  52  NaN
3  0.920922  13  NaN
4  0.708109  48  NaN
0  0.231480  43    a
1  0.492576  10    b
于 2013-04-19T16:03:15.510 回答