python - 如何让 NumPy 创建一个带有字符串和浮点数的矩阵

Question

好吧，我对这个主题做了很多研究，我知道 NumPy 只支持齐次矩阵。

我正在使用 NLTK 包在 Python 中处理一些语料库语言学数据，并且只想制作一个矩阵，其中不同的字符串作为“列名”，实际数据值（浮点数）作为矩阵的其余部分。

到目前为止，我已经制作了两个矩阵，一个带有字符串，一个带有浮点数，并使用 vstack 将它们放在彼此之上。在我尝试将 NumPy 的 savetxt() 方法与这个新的堆叠矩阵“矩阵”一起使用之前，一切都很好而且很花哨，但它不会写入 .csv 文件，因为矩阵不是“矩阵式”，因为它不是同质的。FML。

我真的希望能够将 NumPy 用于处理实际数据点的所有很棒的方法，但是我无法将一个奇怪的字符串数组放在矩阵的顶部以变成 . .csv。有任何想法吗？我真的很想不必通过将 Python 的列表方法用于多维数组来再次尝试这一切。

这是代码：

import os.path
import sys
import nltk
from numpy import *
from nltk.corpus.reader import CHILDESCorpusReader
from nltk.probability import ConditionalFreqDist, FreqDist

n_rows = 12
n_cols = 19
init_row = 0
init_col = 0
neg_words = ["Age", "MLU", "All    Tokens","no","not","don't","can't","won't","isn't","wasn't","wouldn't","shouldn't","couldn't","didn't","haven't","aren't","haven't","hasn't","doesn't"]

Matrix_headers = array(range(len(neg_words)), dtype='a12')
Matrix_values = zeros(n_rows*n_cols).reshape((n_rows, n_cols)) #the matrix with the data    points (floats)

for entry in range(len(neg_words)):
    Matrix_headers[entry] = neg_words[entry]

p = neg_words
q = Matrix_values
Matrix = vstack([p,q])


out_name = "/Users/nicholasmoores/Documents/Research/neg_table.csv"
savetxt(out_name, Matrix, fmt='%.3e',delimiter = "\t")

raw_input("\n\nPress the enter key to exit.")

score 3 · Accepted Answer

您可以使用结构化数组

例如：

>>> ym = np.zeros(len(neg_words), dtype=[('heads','a14'),('vals','f4',(n_rows,))])

array([('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
       ('', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])], 
      dtype=[('heads', 'S14'), ('vals', '<f4', (12,))])

设置标题值：

ym['heads'] = neg_words

要访问标题：

>>> ym['heads']
array(['Age', 'MLU', 'All    Tokens', 'no', 'not', "don't", "can't",
   "won't", "isn't", "wasn't", "wouldn't", "shouldn't", "couldn't",
   "didn't", "haven't", "aren't", "haven't", "hasn't", "doesn't"], 
   dtype='|S14')

同样，要访问值

ym['vals']

python - 如何让 NumPy 创建一个带有字符串和浮点数的矩阵

1 回答 1

Related

Reference