python - 存储这些向量，但在 Python 中使用哪种数据结构

Question

以下循环的每次迭代都会生成一个尺寸为 50x1 的向量，以便将来自循环的所有向量共同存储在一个数据结构中。

  def get_y_hat(y_bar, x_train, theta_Ridge_Matrix):
     print theta_Ridge_Matrix.shape
     print theta_Ridge_Matrix.shape[0]
     for i in range(theta_Ridge_Matrix.shape[0]):
        yH = np.dot(x_train, theta_Ridge_Matrix[i].T)
        print yH

我应该使用哪种数据结构？我是 Python 新手，但根据我在网上研究的内容，有 2 个选项：numpy 数组和列表列表

稍后我将需要在此方法之外访问每个包含 50 个元素的向量。我将存储 200 到 500 个向量。

有人也可以给我这种数据结构的示例代码吗

谢谢

score 2 · Accepted Answer

我认为将循环中的数据存储在 a 中dict，然后将其转换为 a pandas.Dataframe（构建在 numpy 数组之上）应该是一种有效的解决方案，允许您将数据作为一个整体或作为单个向量进一步处理。

举个例子：

import pandas as pd
import numpy as np

data = {}
# this would be your loop
for i in range(50):
    data['run_%02d' % i] = np.random.randn(50)
data = pd.DataFrame(data) # sorted keys of the dict will be the columns

您可以将单个向量作为属性或通过键访问：

print data['run_42'].describe() # or data.run_42.describe()

count    50.000000
mean      0.021426
std       1.027607
min      -2.472225
25%      -0.601868
50%       0.014949
75%       0.641488
max       2.391289

或进一步分析整个数据：

print data.mean()

run_00   -0.015224
run_01   -0.006971
..
run_48   -0.115935
run_49    0.147738

或使用matplotlib（当您使用标记问题时matplotlib）查看您的数据：

data.boxplot(rot=90) 
plt.tight_layout()

example_boxplot

score 0 · Accepted Answer

你可以简单地做

import numpy as np

def get_y_hat(y_bar, x_train, theta_Ridge_Matrix):
     print theta_Ridge_Matrix.shape
     print theta_Ridge_Matrix.shape[0]
     yH = np.empty(theta_Ridge_Matrix.shape[0], theta_Ridge_Matrix[0].shape[0])
     for i in range(theta_Ridge_Matrix.shape[0]):
        yH[i, :] = np.dot(x_train, theta_Ridge_Matrix[i].T)
     print yH

如果您将存储theta_Ridge_Matrix在 3D 数组中，您还可以np.dot使用来完成工作yH = np.dot(x_train, theta_Ridge_Matrix)，这将在矩阵的倒数第二个维度上求和。

score 0 · Accepted Answer

我建议您使用 numpy 来安装它

在此站点的 Windows 上：

http://sourceforge.net/projects/numpy/files/NumPy/

一些示例如何使用它。

import numpy as np

我们将创建一个数组，我们将其命名为 mat

>>> mat = np.random.randn(2,3)
>>> mat
array([[ 1.02063865, 1.52885147, 0.45588211],
       [-0.82198131, 0.20995583, 0.31997462]])

使用动词“T”转置数组

>>> mat.T
array([[ 1.02063865, -0.82198131],
       [ 1.52885147, 0.20995583],
       [ 0.45588211, 0.31997462]])

使用 \verb"reshape" 方法改变任何数组的形状

>>> mat = np.random.randn(3,6)
array([[ 2.01139326, 1.33267072, 1.2947112 , 0.07492725, 0.49765694,
         0.01757505],
       [ 0.42309629, 0.95921276, 0.55840131, -1.22253606, -0.91811118,
         0.59646987],
       [ 0.19714104, -1.59446001, 1.43990671, -0.98266887, -0.42292461,
        -1.2378431 ]])
>>> mat.reshape(2,9)
array([[ 2.01139326, 1.33267072, 1.2947112 , 0.07492725, 0.49765694,
         0.01757505, 0.42309629, 0.95921276, 0.55840131],
       [-1.22253606, -0.91811118, 0.59646987, 0.19714104, -1.59446001,
         1.43990671, -0.98266887, -0.42292461, -1.2378431 ]])

我们可以使用 \verb"shape" 属性来改变变量的形状。

>>> mat = np.random.randn(4,3)
>>> mat.shape
(4, 3)
>>> mat
array([[-1.47446507, -0.46316836, 0.44047531],
       [-0.21275495, -1.16089705, -1.14349478],
       [-0.83299338, 0.20336677, 0.13460515],
       [-1.73323076, -0.66500491, 1.13514327]])
>>> mat.shape = 2,6
>>> mat.shape
(2, 6)

>>> mat
array([[-1.47446507, -0.46316836, 0.44047531, -0.21275495, -1.16089705,
        -1.14349478],
       [-0.83299338, 0.20336677, 0.13460515, -1.73323076, -0.66500491,
         1.13514327]])

score 0 · Accepted Answer

我无法评论 numpy 数组，因为我以前没有使用过，但是对于使用列表列表，Python 已经内置了支持。

例如这样做：

AList = [1, 2, 3]
BList = [4, 5, 6]
CList = [7, 8, 9]
List_of_Lists = []

List_of_Lists.append(AList)
List_of_Lists.append(BList)
List_of_Lists.append(CList)

print(List_of_Lists)

这会产生：

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

还有其他方法可以创建列表，而不是从一开始就将它们全部初始化，例如：

ListCreator = int(input('Input how many lists are needed: '))
ListofLists = [[] for index in range(ListCreator)]

有更多的方法可以解决它，但我不知道你打算如何实施它。

python - 存储这些向量，但在 Python 中使用哪种数据结构

4 回答 4

Related

Reference