python - 如何使用 numpy 从附加的多维数组中删除“无”

Question

我需要获取一个 csv 文件并将这些数据导入到 python 中的多维数组中，但是在将数据附加到空数组后，我不确定如何从数组中删除“无”值。

我首先创建了一个这样的结构：

storecoeffs = numpy.empty((5,11), dtype='object')

这将返回一个由“无”填充的 5 行 x 11 列数组。

接下来，我打开了我的 csv 文件并将其转换为一个数组：

coeffsarray = list(csv.reader(open("file.csv")))

coeffsarray = numpy.array(coeffsarray, dtype='object')

然后，我附加了两个数组：

newmatrix = numpy.append(storecoeffs, coeffsarray, axis=1)

结果是一个由“无”值填充的数组，后跟我想要的数据（显示的前两行让您了解我的数据的性质）：

array([[None, None, None, None, None, None, None, None, None, None, None,
    workers, constant, hhsize, inc1, inc2, inc3, inc4, age1, age2,
    age3, age4],[None, None, None, None, None, None, None, None, None, None, None,
    w0, 7.334, -1.406, 2.823, 2.025, 0.5145, 0, -4.936, -5.054, -2.8, 0],,...]], dtype=object)

如何从每一行中删除那些“无”对象，所以我剩下的是 5 x11 多维数组和我的数据？

score 1 · Accepted Answer

为什么要分配整个Nones 数组并附加到该数组？是coeffsarray不是你想要的数组？

编辑

哦。使用numpy.reshape.

import numpy
coeffsarray = numpy.reshape( coeffsarray, ( 5, 11 ) )

score 1 · Accepted Answer

1

从一个空数组开始？

storecoeffs = numpy.empty((5,0), dtype='object')

于 2010-08-06T19:31:03.870 回答

score 1 · Accepted Answer

为什么不简单地使用numpy.loadtxt()：

newmatrix = numpy.loadtxt("file.csv", dtype='object')

应该做的工作，如果我理解你的问题。

score 1 · Accepted Answer

@Gnibbler 的回答在技术上是正确的，但没有理由首先创建初始storecoeffs数组。只需加载您的值，然后从它们创建一个数组。不过，正如@Mermoz 所指出的，您的用例对于 numpy.loadtxt() 来说看起来很简单。

除此之外，您为什么要使用对象数组？这可能不是您想要的……现在，您将数值存储为字符串，而不是浮点数！

您基本上有两种方法来处理 numpy 中的数据。如果您想轻松访问命名列，请使用结构化数组（或记录数组）。如果你想拥有一个“普通”的多维数组，只需使用浮点数、整数等数组。对象数组有一个特定的目的，但这可能不是你正在做的。

例如：仅将数据加载为普通的 2D numpy 数组（假设您的所有数据都可以轻松表示为浮点数）：

import numpy as np
# Note that this ignores your column names, and attempts to 
# convert all values to a float...
data = np.loadtxt('input_filename.txt', delimiter=',', skiprows=1)

# Access the first column 
workers = data[:,0]

要将数据作为结构化数组加载，您可以执行以下操作：

import numpy as np
infile = file('input_filename.txt')

# Read in the names of the columns from the first row...
names = infile.next().strip().split()

# Make a dtype from these names...
dtype = {'names':names, 'formats':len(names)*[np.float]}

# Read the data in...
data = np.loadtxt(infile, dtype=dtype, delimiter=',')

# Note that data is now effectively 1-dimensional. To access a column,
# index it by name
workers = data['workers']

# Note that this is now one-dimensional... You can't treat it like a 2D array
data[1:10, 3:5] # <-- Raises an error!

data[1:10][['inc1', 'inc2']] # <-- Effectively the same thing, but works..

如果您的数据中有非数值并希望将它们作为字符串处理，则需要使用结构化数组，指定您希望成为字符串的字段，并为字段中的字符串设置最大长度。

从您的示例数据来看，它看起来像第一列，“workers”是一个非数字值，您可能希望将其存储为字符串，其余的看起来像浮点数。在这种情况下，你会做这样的事情：

import numpy as np
infile = file('input_filename.txt')
names = infile.next().strip().split()

# Create the dtype... The 'S10' indicates a string field with a length of 10
dtype = {'names':names, 'formats':['S10'] + (len(names) - 1)*[np.float]}
data = np.loadtxt(infile, dtype=dtype, delimiter=',')

# The "workers" field is now a string array
print data['workers']

# Compare this to the other fields
print data['constant']

如果在某些情况下您确实需要 csv 模块的灵活性（例如带有逗号的文本字段），您可以使用它来读取数据，然后将其转换为具有适当 dtype 的结构化数组。

希望这能让事情变得更清楚......

python - 如何使用 numpy 从附加的多维数组中删除“无”

4 回答 4

编辑

Related

Reference