2

这是MWE我的一段代码:

import numpy as np

# Load data from file.
data = np.genfromtxt('data_input', dtype=None, unpack=True)

print data

data_input这是该文件的示例:

01_500_aa_1000    990.0    990.0   112.5      0.2       72  0  0  1  0  0  0  0  0  0   0   0   0   1
02_500_aa_0950    990.0    990.0   112.5      0.2       77  0  0  1  0  0  0  0  0  0   0   0   0   1
03_500_aa_0600    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1
04_500_aa_0700    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1

unpack参数似乎不起作用,因为它总是打印:

[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]

任何人都可以重现这个吗?我究竟做错了什么?

4

4 回答 4

2

你得到这个是因为genfromtxt返回一个numpy 记录数组,而不是一个list. 只是当你print()把它放到控制台时它看起来像一个list.

from cStringIO import StringIO
raw = """01_500_aa_1000    990.0    990.0   112.5      0.2       72  0  0  1  0  0  0  0  0  0   0   0   0   1
02_500_aa_0950    990.0    990.0   112.5      0.2       77  0  0  1  0  0  0  0  0  0   0   0   0   1
03_500_aa_0600    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1
04_500_aa_0700    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1"""
sio = StringIO(raw)
data = genfromtxt(sio, dtype=None, unpack=False)
print data
print
print data.dtype

给出:

[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]

[('f0', 'S14'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<i8'), ('f6', '<i8'), ('f7', '<i8'), ('f8', '<i8'), ('f9', '<i8'), ('f10', '<i8'), ('f11', '<i8'), ('f12', '<i8'), ('f13', '<i8'), ('f14', '<i8'), ('f15', '<i8'), ('f16', '<i8'), ('f17', '<i8'), ('f18', '<i8')]

unpack=True并且unpack=False似乎返回相同的东西,因为你需要一个recarray. 我建议您尝试完全pandas忘记recarrays 。您可以将 a 传递recarray给 thepandas.DataFrame并真正完成!例如,

df = DataFrame(data)
print df
print
print df.f0

产量:

               f0         f1         f2         f3         f4  f5  f6  f7  f8  \
0  01_500_aa_1000     990.00     990.00     112.50       0.20  72   0   0   1   
1  02_500_aa_0950     990.00     990.00     112.50       0.20  77   0   0   1   
2  03_500_aa_0600     990.00     990.00     112.50       0.18  84   0   0   1   
3  04_500_aa_0700     990.00     990.00     112.50       0.18  84   0   0   1   

   f9  f10  f11  f12  f13  f14  f15  f16  f17  f18  
0   0    0    0    0    0    0    0    0    0    1  
1   0    0    0    0    0    0    0    0    0    1  
2   0    0    0    0    0    0    0    0    0    1  
3   0    0    0    0    0    0    0    0    0    1  

0    01_500_aa_1000
1    02_500_aa_0950
2    03_500_aa_0600
3    04_500_aa_0700
Name: f0, dtype: object
于 2013-08-29T01:44:42.987 回答
2

正如@Phillip Cloud 所提到的,由于混合了数据类型(字符串和数字),您得到了一个recarray - 第0 列中的字符串导致了这种情况。

您可以通过单独导入第 0 列来解决此问题:

>>> np.genfromtxt('data_input', usecols=range(1,18))
array([[  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          2.00000000e-01,   7.20000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          2.00000000e-01,   7.70000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          1.80000000e-01,   8.40000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          1.80000000e-01,   8.40000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00]])
>>> np.genfromtxt('data_input', usecols=0,dtype=None)
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
   '04_500_aa_0700'], 
  dtype='|S14')

或者,您可以像这样引用recarray 中的列:

>>> data['f0']
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
       '04_500_aa_0700'], 
      dtype='|S14')
>>> data['f5']
array([72, 77, 84, 84])
于 2013-08-29T09:00:20.923 回答
0

I can reproduce this. However, if your change dtype to float I get

[[             nan              nan              nan              nan]
 [  9.90000000e+02   9.90000000e+02   9.90000000e+02   9.90000000e+02]
 [  9.90000000e+02   9.90000000e+02   9.90000000e+02   9.90000000e+02]
 [  1.12500000e+02   1.12500000e+02   1.12500000e+02   1.12500000e+02]
 [  2.00000000e-01   2.00000000e-01   1.80000000e-01   1.80000000e-01]
 [  7.20000000e+01   7.70000000e+01   8.40000000e+01   8.40000000e+01]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 ...

I got the idea from this mailing list question.

Look at an answer given here. np.genfromtxt() returns data of the type ndarray. This cannot be heterogeneous.

于 2013-08-28T23:50:44.580 回答
0

我发布了我自己的答案,因为这是我最终使用的。

import numpy as np

# Load data from file.
data = np.genfromtxt('data_input', dtype=None)

# Force transpose list.
data = zip(*data)

这实际上有效,并且很容易理解和使用。

于 2013-08-29T10:43:37.813 回答