1

我正在尝试读取具有 .csv 格式的多种数据格式的文件。我正在使用 Python3.2 和 Numpy 1.9。我正在使用 numpy genfromtxt 函数来读取数据。我希望我可以在读取时转换数据以适当地存储它而不是稍后处理它,为此我在选项中使用转换器功能。

使用多个转换器功能似乎是个问题。下面列出了代码、代码的输入和输出。如您所见,第一行输出来自输入文件的与其他列不同的列。

以前有人用过这个功能吗?我的代码中是否有错误?

代码:

 converterfunc_time=   lambda x : (datetime.strptime(x.decode('UTF-8'),'%m/%d/%Y %I:%M:%S %p'))
    def converterfunc_lat(x):
        print(x);    print(x.decode('UTF-8'))
        #return float(x.decode('utf-8').split('N')[1])
    def converterfunc_san(x):
        #print(x)
        return (x.decode('UTF-8'))  



class input_file_processing():
        def __init__(self): 
             self.input_data=(np.genfromtxt('filename',skip_header=1,dtype=None,usecols=(0,1,6,7,8,9,10,13), names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
                              converters=0:converterfunc_time,1:converterfunc_san,6:converterfunc_lat},    delimiter=','))

**输入 **

input, file, 1
4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33,00.546,W118,00.638,3,11,1,104,102,82,6,18,2048,4039587
4/2/2015 2:13:55 PM,DSN001000861511,03-01-02,0010416164,0,0,N33,00.883,W118,00.208,3,11,1,106,102,88,6,18,2048,2792940
4/2/2015 2:14:44 PM,DSN001000871692,03-01-04,0010408734,0,0,N33,00.876,W118,00.110,3,11,1,105,102,80,6,18,2048,312623
4/2/2015 2:14:52 PM,DSN001000864906,03-01-05,0010055143,0,0,N33,08.000,W118,03.000,3,11,1,107,99,83,6,18,2048,3056425
4/2/2015 2:15:00 PM,DSN001000838651,03-01-06,0010265541,0,0,N33,09.749,W118,00.317,3,11,1,100,110,74,6,14,2048,3737937
4/2/2015 2:15:08 PM,DSN001000609313,03-01-07,0010152885,0,0,N33,05.854,W118,04.107,3,11,1,94,95,62,6,14,2048,8221318
4/2/2015 2:15:19 PM,DSS31967278,03-01-08,0010350817,0,0,N33,04.551,W118,02.359,3,11,1,127,105,77,6,21,2048,21157710
4/2/2015 2:16:08 PM,DSN001000822728,03-01-10,0010051377,0,0,N33,00.899,W118,00.132,3,11,1,116,95,61,6,19,2048,3526254

输出

b'03-01-01'
03-01-01
b'N33'
N33
b'N33'
N33
b'N33'
N33
b'N33'
N33
b'N33'

谢谢

4

1 回答 1

0

我不完全确定发生了什么。但是这个脚本运行:

import numpy as np
from datetime import datetime

txt = b"""input, file, 1
4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33,00.546,W118,00.638,3,11,1,104,102,82,6,18,2048,4039587
4/2/2015 2:13:55 PM,DSN001000861511,03-01-02,0010416164,0,0,N34,00.883,W118,00.208,3,11,1,106,102,88,6,18,2048,2792940
4/2/2015 2:14:44 PM,DSN001000871692,03-01-04,0010408734,0,0,N35,00.876,W118,00.110,3,11,1,105,102,80,6,18,2048,312623
4/2/2015 2:14:52 PM,DSN001000864906,03-01-05,0010055143,0,0,N36,08.000,W118,03.000,3,11,1,107,99,83,6,18,2048,3056425
4/2/2015 2:15:00 PM,DSN001000838651,03-01-06,0010265541,0,0,N33,09.749,W118,00.317,3,11,1,100,110,74,6,14,2048,3737937
4/2/2015 2:15:08 PM,DSN001000609313,03-01-07,0010152885,0,0,N33,05.854,W118,04.107,3,11,1,94,95,62,6,14,2048,8221318
"""
txt = txt.splitlines()
#txt = txt[1:]
txt = txt[:3]
converterfunc_time = lambda x : (datetime.strptime(x.decode('UTF-8'),'%m/%d/%Y %I:%M:%S %p'))
def converterfunc_lat(x):
    print('lat ',x, x.decode('UTF-8'))
    x1 = x.decode('utf-8').split('N')
    if len(x1)>1:
        x1 = float(x1[1])
        print('float',x1)
        return x1
    else:
        print('error')
        return "error"
def converterfunc_san(x):
    #print(x)
    return x.decode('UTF-8')

data = np.genfromtxt(txt, skip_header=1,
                    dtype=None,
                    usecols=(0,1,6,7,8,9,10,13),
                    names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
                    delimiter=',')
print(data)
print()
input_data=np.genfromtxt(txt,
            skip_header=1,
            dtype='O,a20,f',
            usecols=(0,1,6,), #(0,1,6,7,8,9,10,13),
            names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
            converters={0:converterfunc_time,
                        1:converterfunc_san,
                        6:converterfunc_lat},
            delimiter=',')
print(input_data)

并生产

1552:~/mypy$ python3 stack30269235.py 
[ (b'4/2/2015 2:13:44 PM', b'DSN001000557867', b'N33', 0.546, b'W118', 0.638, 3, 104)
 (b'4/2/2015 2:13:55 PM', b'DSN001000861511', b'N34', 0.883, b'W118', 0.208, 3, 106)]

lat  b'03-01-01' 03-01-01
error
lat  b'N33' N33
float 33.0
lat  b'N34' N34
float 34.0
[(datetime.datetime(2015, 4, 2, 14, 13, 44), b'DSN001000557867', 33.0)
 (datetime.datetime(2015, 4, 2, 14, 13, 55), b'DSN001000861511', 34.0)]

我不得不填写您的问题中缺少的一些内容。

我添加了一个明确dtype的以确保我得到了字符串和浮点列。

我修改了lat转换器,使其不会在“03-01-01”输入上阻塞。...


genfromtxt对您的转换器进行某种测试运行:

    # Find the value to test:
    if len(first_line):
        testing_value = first_values[i]
    else:
        testing_value = None
    converters[i].update(conv, locked=True,
                         testing_value=testing_value,
                         default=filling_values[i],
                         missing_values=missing_values[i],)
    uc_update.append((i, conv))

看起来它正在使用第一条数据线:

4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33

在分隔符上拆分它,并使用第三个字符串 ,03-01-01作为测试值。即6,它在您的 usecols 参数中使用索引 6 而不是 。它在匹配usecols、转换器 idnames以及可能的 dtype 时遇到问题。

此测试值的目的是确定dtype列的值。在这种dtype=None情况下是需要的。如果您指定dtype. 显然它仍然运行它。

在我不跳过列的测试中,匹配转换器和测试值没有问题。

于 2015-05-15T22:57:40.033 回答