nutrition
我很困惑,由于下面的代码,由于部署的代码在同一列(下面)中生成了整数和布尔值,因此错误在哪里蔓延。它不会发生在测试中的小数据中。这里会发生什么?
在总和不超过 1 的月份中LopNr
,pandas 没有将 转换True
为 1?为什么不?无论如何,以这种方式手动覆盖最终结果是否安全?
数据具有相关列的行,如下所示:
LopNr DIAGNOS INDATUMA
1 E12 E14 20050705
代码是:
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
all_treatments = list()
filelist = ['file1']
nutrition_codes = '|'.join(["D{}".format(i) for i in range(50, 54)] + ["E{}".format(i) for i in range(10, 15)] + ["E{}".format(i) for i in range(40, 47)] + ["E{}".format(i) for i in range(50, 69)])
for file in filelist:
filename = 'PATH/' + file +'.txt'
if file[0]=='o':
treatments = pd.read_table(filename,usecols=[0,8,10])
elif file[0]=='s':
treatments = pd.read_table(filename,usecols=[0,8,11])
else:
print "file should start with s or o, no?"
all_treatments.append(treatments)
all_treatments = pd.concat(all_treatments, ignore_index=True)
all_treatments['date'] = pd.to_datetime(all_treatments['INDATUMA'].astype(str), coerce=True)
all_treatments['year'] = all_treatments['date'].dt.year
all_treatments['month'] = all_treatments['date'].dt.month
all_treatments['nutrition'] = all_treatments.DIAGNOS.str.contains(nutrition_codes)
all_treatments = all_treatments.drop(['DIAGNOS','INDATUMA','date'], axis=1)
all_treatments = all_treatments.groupby(['LopNr','year','month']).sum().astype(int,copy=False,raise_on_error=False)
all_treatments.to_csv('PATH/treatments_monthly.csv')