4

在我的应用程序中,我生成了许多值(三列,int、str 和 datetime 类型,参见下面的示例),这些值作为逗号分隔的字符串存储在平面文件中。此外,我存储了一个包含值类型的文件(见下文)。现在,如何使用这些信息将我的值从平面文件转换为 Python 中的正确数据类型?是可能的还是我需要做一些其他的事情?

数据文件:

#id,value,date
1,a,2011-09-13 15:00:00
2,b,2011-09-13 15:10:00
3,c,2011-09-13 15:20:00
4,d,2011-09-13 15:30:00

类型文件:

id,<type 'int'>
value,<type 'str'>
date,<type 'datetime.datetime'>
4

7 回答 7

4

据我了解,您已经解析了文件,您现在只需要获取正确的类型。假设id_和是包含文件中值的三个字符串type_value(注意,type_应该包含'int'——例如——,而不是'<type 'int'>'.

def convert(value, type_):
    import importlib
    try:
        # Check if it's a builtin type
        module = importlib.import_module('__builtin__')
        cls = getattr(module, type_)
    except AttributeError:
        # if not, separate module and class
        module, type_ = type_.rsplit(".", 1)
        module = importlib.import_module(module)
        cls = getattr(module, type_)
    return cls(value)

然后你可以像..一样使用它:

value = convert("5", "int")

不幸的是,对于日期时间,这不起作用,因为它不能简单地通过其字符串表示进行初始化。

于 2011-09-13T14:29:03.277 回答
2

您的类型文件可以更简单:

id=int
value=str
date=datetime.datetime

然后在你的主程序中你可以

import datetime

def convert_datetime(text):
    return datetime.datetime.strptime(text, "%Y-%m-%d %H:%M:%S")

data_types = {'int':int, 'str':str, 'datetime.datetime':convert_datetime}
fields = {}

for line in open('example_types.txt').readlines():
    key, val = line.strip().split('=')
    fields[key] = val

data_file = open('actual_data.txt')
field_info = data_file.readline().strip('#\n ').split(',')
values = [] #store it all here for now

for line in data_file.readlines():
    row = []
    for i, element in enumerate(line.strip().split(',')):
        element_type = fields[field_info[i]] # will get 'int', 'str', or 'datetime'
        convert = data_types[element_type]
        row.append(convert(element))
    values.append(row)

# to show it working...
for row in values:
    print row
于 2011-09-13T14:35:30.017 回答
1

按着这些次序:

  1. 逐行读取文件,对每一行执行以下步骤
  2. 使用split()with,作为分隔符分割行。
  3. 将列表的第一个元素(来自步骤 2)转换为 int。将第二个元素保留为字符串。解析第三个值(e.g. using slices)并创建一个datetime相同的对象。
于 2011-09-13T13:31:44.520 回答
1

我不得不在最近的一个程序中处理类似的情况,它必须转换许多字段。我使用了一个元组列表,其中元组的一个元素是要使用的转换函数。有时是intfloat;有时它很简单lambda;有时它是在别处定义的函数的名称。

于 2011-09-13T13:39:22.193 回答
0

Instead of having a separate "type" file, take your list of tuples of (id, value, date) and just pickle it.

Or you'll have to solve the problem of storing your string-to-type converters as text (in your "type" file), which might be a fun problem to solve, but if you're just trying to get something done, go with pickle or cPickle

于 2011-09-13T13:58:28.937 回答
0

First, you cannot write a "universal" or "smart" conversion that magically handles anything.

Second, trying to summarize a string-to-data conversion in anything other than code never seems to work out well. So rather than write a string that names the conversion, just write the conversion.

Finally, trying to write a configuration file in a domain-specific language is silly. Just write Python code. It's not much more complicated than trying to parse some configuration file.

Is is possible or do i need to do some other stuff?

Don't waste time trying to create a "type file" that's not simply Python. It doesn't help. It is simpler to write the conversion as a Python function. You can import that function as if it was your "type file".

import datetime

def convert( row ):
   return dict(
       id= int(row['id']),
       value= str(row['value']),
       date= datetime.datetime.strptime(row['date],"%Y-%m-%d %H:%M:%S"),
   )

That's all you have in your "type file"

Now you can read (and process) your input like this.

 from type_file import convert
 import csv

 with open( "date", "rb" ) as source:
     rdr= csv.DictReader( source )
     for row in rdr:
         useful_row= convert( row )

in many cases i do not know the number of columns or the data type before runtime

This means you are doomed.

You must have an actual definition the file content or you cannot do any processing.

"id","value","other value"
1,23507,3

You don't know if "23507" should be an integer, a string, a postal code, or a floating-point (which omitted the period), a duration (in days or seconds) or some other more complex thing. You can't hope and you can't guess.

After getting a definition, you need to write an explicit conversion function based on the actual definition.

After writing the conversion, you need to (a) test the conversion with a simple unit test, and (b) test the data to be sure it really converts.

Then you can process the file.

于 2011-09-13T13:59:01.800 回答
0

您可能想查看 xlrd 模块。如果您可以将数据加载到 excel 中,并且它知道与每列关联的类型,xlrd 会在您读取 excel 文件时为您提供类型。当然,如果数据是以 csv 格式提供给您的,则必须有人进入 excel 文件并手动更改列类型。

不确定这可以让您一直到达您想去的地方,但它可能会有所帮助

于 2011-09-13T16:54:07.710 回答