0

nutrition我很困惑,由于下面的代码,由于部署的代码在同一列(下面)中生成了整数和布尔值,因此错误在哪里蔓延。它不会发生在测试中的小数据中。这里会发生什么?

在总和不超过 1 的月份中LopNr,pandas 没有将 转换True为 1?为什么不?无论如何,以这种方式手动覆盖最终结果是否安全?

数据具有相关列的行,如下所示:

LopNr      DIAGNOS     INDATUMA
    1      E12 E14     20050705

代码是:

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

all_treatments = list()
filelist = ['file1']

nutrition_codes = '|'.join(["D{}".format(i) for i in range(50, 54)] +  ["E{}".format(i) for i in range(10, 15)] + ["E{}".format(i) for i in range(40, 47)] +  ["E{}".format(i) for i in range(50, 69)])

for file in filelist:
    filename = 'PATH/' + file +'.txt'
    if file[0]=='o':
        treatments = pd.read_table(filename,usecols=[0,8,10])
    elif file[0]=='s':
        treatments = pd.read_table(filename,usecols=[0,8,11])
    else:
        print "file should start with s or o, no?"
    all_treatments.append(treatments)

all_treatments = pd.concat(all_treatments, ignore_index=True)
all_treatments['date'] = pd.to_datetime(all_treatments['INDATUMA'].astype(str), coerce=True)
all_treatments['year'] = all_treatments['date'].dt.year
all_treatments['month'] = all_treatments['date'].dt.month
all_treatments['nutrition'] = all_treatments.DIAGNOS.str.contains(nutrition_codes)
all_treatments = all_treatments.drop(['DIAGNOS','INDATUMA','date'], axis=1)
all_treatments = all_treatments.groupby(['LopNr','year','month']).sum().astype(int,copy=False,raise_on_error=False)
all_treatments.to_csv('PATH/treatments_monthly.csv')
4

0 回答 0