使用 csvkit 时,我无法阻止字符数据转换为数字数据。对于下面的示例,我的第一列被转换为“int”
数据:(test.csv)
"BG_ID_10","DisSens_2010","PrivateNeglect_2010"
"250250001001",0.506632168908,0.363523524561
"250250001004",0.346632168908,0.352456136352
代码片段:
from csvkit import sql as csvkit_sql
from csvkit import table
from csv import QUOTE_NONNUMERIC
fh = open('test.csv', 'rb')
csv_table = table.Table.from_csv(f=fh,\
name='tname',\
delimiter=',',\
quotechar='"',\
snifflimit=0,\
)
for col in csv_table:
print col.name, col.type
输出:
BG_ID_10 <type 'int'>
DisSens_2010 <type 'float'>
PrivateNeglect_2010 <type 'float'>
我有一个有效的技巧,但会感谢“from_csv”或替代建议的任何帮助更好的参数。(注意,在这一步之后,csvkit 命令用于生成 Postgres 创建表语句。)
工作技巧:
char_col = csv_table[0] # get first column
char_col.type = unicode # change type
for idx, val in enumerate(char_col): # force to unicode
char_col[idx] = u'%s' % val