我正在尝试从 pdf 格式的表格中获取每月信息。我编写了一个运行良好的代码,但在某些特定月份出现错误。我所做的是这样的:
def provincias(codigo,pages:list):
box = [4,2.8,19,27]
fc = 28.28
for i in range(0, len(box)):
box[i] *= fc
path = "http://www.bcra.gob.ar/Pdfs/PublicacionesEstadisticas/BoletinEstadistico/" + 'boldat' + str(codigo) +".pdf"
tables = read_pdf(path, pages=pages, area=[box], stream=True)
tablas1 = pd.DataFrame()
tablas2 = pd.DataFrame()
for page in pages:
if page % 2 != 0:
tablas1 = pd.concat(tables)
tablas1 = tablas1.drop(columns=['(6)(7)','Unnamed: 1','Unnamed: 2','Unnamed: 4'])
tablas1 = tablas1.rename(columns={'Unnamed: 0':'Actividad','Unnamed: 3':'Capital Federal','Aires':'Gran Buenos Aires','Unnamed: 5':'Resto Bs As',
'Unnamed: 6':'Catamarca','Unnamed: 7':'Cordoba','Unnamed: 8':'Corrientes','Unnamed: 9':'Chaco','Unnamed: 10':'Chubut',
'Unnamed: 11':'Entre Rios','Unnamed: 12':'Formosa','Unnamed: 13':'Jujuy'})
tablas1['ID'] = range(len(tablas1))
return tablas1
else:
for tabla in tables:
if len(tabla.columns) <= 17:
tabla = tabla.drop(columns=['(6)(7)','Unnamed: 1'])
tabla = tabla.rename(columns={'Unnamed: 0':'Actividad','Unnamed: 2':'La Pampa','Unnamed: 3':'La Rioja','Unnamed: 4':'Mendoza',
'Unnamed: 5':'Misiones','Unnamed: 6':'Neuquen','Unnamed: 7':'Rio Negro','Unnamed: 8':'Salta',
'Unnamed: 9':'San Juan','Unnamed: 10':'San Luis','Unnamed: 11':'Santa Cruz','Unnamed: 12':'Santa Fe',
'Estero':'Santiago del Estero','Fuego':'Tierra del Fuego','Unnamed: 13':'Tucuman'})
tabla = pd.DataFrame(tabla)
tablas2 = tablas2.append(tabla)
elif len(tabla.columns) > 17:
tabla = tabla.drop(columns=['(6)(7)','Unnamed: 1','Unnamed: 13'])
tabla = tabla.rename(columns={'Unnamed: 0':'Actividad','Unnamed: 2':'La Pampa','Unnamed: 3':'La Rioja','Unnamed: 4':'Mendoza',
'Unnamed: 5':'Misiones','Unnamed: 6':'Neuquen','Unnamed: 7':'Rio Negro','Unnamed: 8':'Salta',
'Unnamed: 9':'San Juan','Unnamed: 10':'San Luis','Unnamed: 11':'Santa Cruz','Unnamed: 12':'Santa Fe',
'Estero':'Santiago del Estero','Fuego':'Tierra del Fuego','Unnamed: 14':'Tucuman'})
tabla = pd.DataFrame(tabla)
tablas2 = tablas2.append(tabla)
tablas2 = tablas2.drop(columns=['Actividad'])
tablas2['ID'] = range(len(tablas2))
return tablas2
然后,如果我运行这样的代码,它会很好用:
enero2020 = provincias(202001, pages=[307,309]).merge(provincias(202001, pages=[308,310]), how='left', on='ID').drop(columns=['ID'])
febrero2020 = provincias(202002, pages=[307,309]).merge(provincias(202002, pages=[308,310]), how='left', on='ID').drop(columns=['ID'])
但是我对六月的信息有疑问。如果我运行这个:
junio2021 = provincias(202106, pages=[313,315]).merge(provincias(202106, pages=[314,316]), how='left', on='ID').drop(columns=['ID'])
它给了我标题中的错误:只能合并 Series 或 DataFrame 对象,传递了一个 <class 'NoneType'> 。我可以看到问题出在合并部分,但我尝试了很多东西并无法解决它。
我看到在“junio2021”的情况下,两个页面 [314,316] 的列长度相同,但我不知道这是否是问题所在。
谢谢!!