4

我有一个包含模拟数据的文本文件(60 列,100k 行):

a  b   c  
1  11 111
2  22 222
3  33 333
4  44 444

... 其中第一行是变量名,下面(列中)是相应的数据(浮点类型)。

我需要在 Python 中将所有这些变量与它们的数据一起使用以进行进一步的计算。例如,当我插入:

print(b)

我需要从第二列接收值。

我知道如何导入数据:

data=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

“手动”分配变量:

a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

但是我在获取变量名时遇到了麻烦:

reader = csv.reader(open("1.txt", "rt"))
for row in reader: 
   list.append(row)
variables=(list[0])  

如何更改此代码以从第一行获取所有变量名称并将它们分配给导入的数组?

4

4 回答 4

3

Instead of trying to assign names, you might think about using an associative array, which is known in Python as a dict, to store your variables and their values. The code could then look something like this (borrowing liberally from the csv docs):

import csv
with open('1.txt', 'rt') as f:
  reader = csv.reader(f, delimiter=' ', skipinitialspace=True)

  lineData = list()

  cols = next(reader)
  print(cols)

  for col in cols:
    # Create a list in lineData for each column of data.
    lineData.append(list())


  for line in reader:
    for i in xrange(0, len(lineData)):
      # Copy the data from the line into the correct columns.
      lineData[i].append(line[i])

  data = dict()

  for i in xrange(0, len(cols)):
    # Create each key in the dict with the data in its column.
    data[cols[i]] = lineData[i]

print(data)

data then contains each of your variables, which can be accessed via data['varname'].

So, for example, you could do data['a'] to get the list ['1', '2', '3', '4'] given the input provided in your question.

I think trying to create names based on data in your document might be a rather awkward way to do this, compared to the dict-based method shown above. If you really want to do that, though, you might look into reflection in Python (a subject I don't really know anything about).

于 2013-08-10T01:41:21.247 回答
2

答案是:你不想那样做

字典正是为此目的而设计的:您真正想要的数据结构将类似于:

data = {
    "a": [1, 2, 3, 4],
    "b": [11, 22, 33, 44],
    "c": [111, 222, 333, 444],
}

...然后您可以使用例如轻松访问data["a"]

可以做你想做的事,但通常的方法是一种 hack,它依赖于 Python 在内部使用 (drumroll) a 来存储变量的事实——而且由于dict你的代码不知道这些变量的名称,你会也坚持使用字典访问来获取它们......所以你不妨一开始就使用字典。

值得指出的是,这在 Python 中是故意变得困难的,因为如果你的代码不知道变量的名称,它们是定义数据而不是逻辑,应该这样对待。

如果你还不相信,这里有一篇关于这个主题的好文章:

愚蠢的 Python 想法:为什么你不想动态创建变量

于 2013-08-10T01:37:17.287 回答
0

感谢@andyg0808 和@Zero Piraeus,我找到了另一个解决方案。对我来说,最合适的——使用 Pandas 数据分析库。

   import pandas as pd

   data=pd.read_csv("1.txt",
           delim_whitespace=True,
           skipinitialspace=True)

  result=data["a"]*data["b"]*3
  print(result)

  0     33
  1    132
  2    297
  3    528

...其中 0,1,2,3 是行索引。

于 2013-08-10T20:20:57.917 回答
0

这是将变量名和数据的 .txt 文件转换为 NumPy 数组的简单方法。

D = np.genfromtxt('1.txt',dtype='str')    # load the data in as strings
D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats
D_names = D[0,:]                          # save a list of the variable names

for i in range(len(D_names)):
    key = D_names[i]                      # define the key for this variable 
    val = D_data[:,i]                     # set the value for this variable 
    exec(key + '=val')                    # build the variable  code here

我喜欢这种方法,因为它易于遵循且易于维护。我们可以按如下方式压缩这段代码:

D = np.genfromtxt('1.txt',dtype='str')     # load the data in as strings
for i in range(D.shape[1]):
    val = np.asarray(D[1::,i],dtype=float) # set the value for this variable 
    exec(D[0,i] + '=val')                  # build the variable 

两个代码都做同样的事情,返回名为 a、b 和 c 的 NumPy 数组及其关联数据。

于 2017-11-12T20:07:55.207 回答