我制作了一个代码来选择“公司合作伙伴”文件中的公司编号字段,然后与具有特定国家/地区的公司列表的文件进行比较,将结果写入第三个文件(最终结果:让合作伙伴该州的所有公司)。代码很简单:
import pandas as pd
dsocio = pd.read_csv('D:/CNPJ-full-master/cnpj-csv/socios.csv', chunksize=262144, low_memory=False)
duf = pd.read_csv('D:/pyData/receita/empES.csv', usecols = ['cnpj'], low_memory=False)
for chunk in dsocio:
result = chunk[chunk['cnpj'].isin(duf.cnpj)]
result.to_csv('D:/CNPJ-full-master/cnpj-csv/UFs/socioES.csv', index=False, header=True, mode='a')
问题是,我有两个版本的“empES.csv”文件。它们的列数不同,但都将字段“cnpj”作为第一列。这是我唯一需要的字段。当我运行传递版本 1 文件的代码时,它运行完美。但是,当我尝试打开版本 2 时,我的输出文件开始仅填充标题。带有标题的许多行!
以下是第一行的一些片段:
- 合作伙伴文件(socios.csv,我将从中复制匹配行的文件):
''' "cnpj","tipo_socio","nome_socio","cnpj_cpf_socio","cod_qualificacao","perc_capital","data_entrada","cod_pais_ext","nome_pais_ext","cpf_repres","nome_repres","cod_qualif_repres"
"00000000000191","2","MARCIO HAMILTON FERREIRA","* 923641 ","10",0.0,"20101117","","","","","00" "00000000000191","2 ","NILSON MARTINIANO MOREIRA","* 491386 ","10",0.0,"20101117","","","","","00" "00000000002135","2","DEBORA CRISTINA FONSECA ","* 314628 ","08",0.0,"20200312","","","","","00" "00000000002216","2","WALDERY RODRIGUES JUNIOR","* 025913 " ,"08",0.0,"20200312","","","","","00" "00000000002216","2","ERIK DA COSTA BREYER","* 093217 ","10",0.0,"20191209","",""," ","","00" "00000000002216","2","汤普森苏亚雷斯佩雷拉塞萨尔","* 503187 ","10",0.0,"20191209","","","","", "00" "00000000002569","2","WALTER MALIENI JUNIOR","* 718468"","","00" "00000000002216","2","汤普森苏亚雷斯佩雷拉塞萨尔","* 503187 ","10",0.0,"20191209","","","","" ,"00" "00000000002569","2","WALTER MALIENI JUNIOR","* 718468"","","00" "00000000002216","2","汤普森苏亚雷斯佩雷拉塞萨尔","* 503187 ","10",0.0,"20191209","","","","" ,"00" "00000000002569","2","WALTER MALIENI JUNIOR","* 718468","10",0.0,"20101117","","","","","00" "00000000002569","2","NILSON MARTINIANO MOREIRA","* 491386 ","10", 0.0,"20101117","","","","","00" "00000000002640","2","WALDERY RODRIGUES JUNIOR","* 025913 ","08",0.0,"20200312", "","","","","00" '''
- 工作公司文件 (empES.csv),我从中只读取了“cnpj”字段:
''' cnpj,identificador_matriz_filial,razao_social,nome_fantasia,situacao_cadastral,data_situacao_cadastral,motivo_situacao_cadastral,nome_cidade_exterior,codigo_natureza_juridica,data_inicio_atividade,cnae_fiscal,descricao_tipo_logradouro,logradouro,numero,complemento,bairro,cep,uf,codigo_municipio,municipio,ddd_telefone_1,ddd_telefone_2,ddd_fax,qualificacao_do_responsavel ,capital_social,porte,opcao_pelo_simples,data_opcao_pelo_simples,data_exclusao_do_simples,opcao_pelo_mei,situacao_especial,data_situacao_especial
2135,2,BANCO DO BRASIL SA,VITORIA - ES,2,2005-11-03,0,,2038,1966-08-01,6421200,PRACA,PIO XII,30,,CENTRO,29010340.0,ES,5705, VITORIA,,,,10,0.0,5,0,,,0,, 8338,2,BANCO DO BRASIL SA,CACHOEIRO DE ITAPEMIRIM-ES-EST UNIF,2,2005-11-03,0,,2038,1966 -08-01,6421200,PRACA,JERONIMO MONTEIRO,26,,CENTRO,29300902.0,ES,5623,CACHOEIRO DE ITAPEMIRIM,,,,10,0.0,5,0,,,0,, 11207,2,BANCO DO BRASIL SA,COLATINA-ES-EST.UNIF,2,2005-11-03,0,,2038,1966-08-01,6421200,RUA,EXPED ABILIO DOS SANTOS,124,,CENTRO,29700070.0,ES,5629,COLATINA ,,,,10,0.0,5,0,,,0,, 18643,2,BANCO DO BRASIL SA,,2,2005-11-03,0,,2038,1966-08-01,6421200,RUA, PRESIDENTE VARGAS,29,,CENTRO,29400000.0,ES,5667,MIMOSO DO SUL,,,,10,0.0,5,0,,,0,, 19615,2,BANCO DO BRASIL SA,,2,2005-11- 03,0,,2038,1982-05-04,6421200,AVENIDA,SENADOR EURICO RESENDE,994,,CENTRO,29845000.0,ES,5619,BOA ESPERANCA,,,,10,0.0,5,0,,,0, , 20974,2,BANCO DO BRASIL SA,SANTA TERESA ES-EST UNIF,2,2005-11-03,0,,2038,1966-08-01,6421200,RUA,JERONIMO VERVLOET,178,,CENTRO,29650000.0,ES,5691,SANTA特蕾莎,,,,10,0.0,5,0,,,0,, '''
- 新公司文件(empES.csv),这给了我奇怪的行为:
''' cnpj,matriz_filial,razao_social,nome_fantasia,situacao,data_situacao,motivo_situacao,nm_cidade_exterior,cod_pais,nome_pais,cod_nat_juridica,data_inicio_ativ,cnae_fiscal,tipo_logradouro,logradouro,numero,complemento,bairioro,cep_unicipio,cod_m1,cipioro,cep_unicipio,cod_m, ,ddd_2,telefone_2,ddd_fax,num_fax,email,qualif_resp,capital_social,porte,opc_simples,data_opc_simples,data_exc_simples,opc_mei,sit_especial,data_sit_especial
2135,2,BANCO DO BRASIL SA,VITORIA - ES,2,20051103,0,,,,2038,19660801,6421200,PRACA,PIO XII,30,,CENTRO,29010340.0,ES,5705,VITORIA,,,,, ,,AGE0021@BB.COM.BR,10,0.0,5,0,,,,, 8338,2,BANCO DO BRASIL SA,CACHOEIRO DE ITAPEMIRIM-ES-EST UNIF,2,20051103,0,,,,2038 ,19660801,6421200,PRACA,JERONIMO MONTEIRO,26,,CENTRO,29300902.0,ES,5623,CACHOEIRO DE ITAPEMIRIM,,,,,,,,10,0.0,5,0,,,,, 11207,2,BANCO DO BRASIL SA,COLATINA-ES-EST.UNIF,2,20051103,0,,,,2038,19660801,6421200,RUA,EXPED ABILIO DOS SANTOS,124,,CENTRO,29700070.0,ES,5629,COLATINA,,,,, ,,,10,0.0,5,0,,,,, 18643,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,PRESIDENTE VARGAS,29,,CENTRO, 29400000.0,ES,5667,MIMOSO DO SUL,,,,,,,,10,0.0,5,0,,,,,, 19615,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038, 19820504,6421200,AVENIDA,SENADOR EURICO RESENDE,994,,CENTRO,29845000.0,ES,5619,BOA ESPERANCA,,,,,,,,10,0.0,5,0,,,,, 20974,2,BANCO DO BRASIL SA,SANTA TERESA ES-EST UNIF,2,20051103,0,,,,2038,19660801,6421200,RUA,JERONIMO VERVLOET,178,,CENTRO,29650000.0,ES ,5691,SANTA TERESA,,,,,,,,10,0.0,5,0,,,,, 22241,2,BANCO DO BRASIL SA,SAO MATEUS ES EST UNIF,2,20051103,0,,,,2038 ,19660801,6421200,AVENIDA,JONES DOS SANTOS NEVES,324,,CENTRO,29930010.0,ES,5697,SAO MATEUS,,,,,,,,10,0.0,5,0,,,,, 28100,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,AVENIDA,JERONIMO MONTEIRO,38/46,,CENTRO,29500000.0,ES,5603,ALEGRE,,,,,,,,10,0.0 ,5,0,,,,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,瓜崔,,,,,,,,10,0.0,5,0,,,,, '''0.0,5,0,,,,, 22241,2,BANCO DO BRASIL SA,SAO MATEUS ES EST UNIF,2,20051103,0,,,,2038,19660801,6421200,AVENIDA,JONES DOS SANTOS NEVES,324,, CENTRO,29930010.0,ES,5697,SAO MATEUS,,,,,,,,10,0.0,5,0,,,,, 28100,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038 ,19660801,6421200,AVENIDA,JERONIMO MONTEIRO,38/46,,CENTRO,29500000.0,ES,5603,ALEGRE,,,,,,,,,10,0.0,5,0,,,,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,瓜翠,,,,,,,,10,0.0,5, 0,,,,, '''0.0,5,0,,,,, 22241,2,BANCO DO BRASIL SA,SAO MATEUS ES EST UNIF,2,20051103,0,,,,2038,19660801,6421200,AVENIDA,JONES DOS SANTOS NEVES,324,, CENTRO,29930010.0,ES,5697,SAO MATEUS,,,,,,,,10,0.0,5,0,,,,, 28100,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038 ,19660801,6421200,AVENIDA,JERONIMO MONTEIRO,38/46,,CENTRO,29500000.0,ES,5603,ALEGRE,,,,,,,,,10,0.0,5,0,,,,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,瓜翠,,,,,,,,10,0.0,5, 0,,,,, ''',,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,GUACUI,,,,, ,,,10,0.0,5,0,,,,, ''',,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,GUACUI,,,,, ,,,10,0.0,5,0,,,,, '''
这是我传递第一个 empES.csv 文件时的输出示例:
''' cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres
2135,2,WALDERY RODRIGUES JUNIOR,* 025913 ,8,0.0,20200312,,,,,0 2135,2,ERIK DA COSTA BREYER,* 093217 ,10,0.0,20191209,,,,,0 2135,2,THOMPSON SOARES PEREIRA CESAR,* 503187 ,10,0.0,20191209,,,,,0 2135,2,MAURICIO NOGUEIRA,* 894537 ,10,0.0,20191209,,,,,0 2135,2,DANIEL ANDRE STIELER,* 145110 , 10,0.0,20190910,,,,,0 2135,2,ENIO MATHIAS FERREIRA,* 078106 ,10,0.0,20181107,,,,,0 2135,2,RONALDO SIMON FERREIRA,* 685018 ,10,0.0,20190729, ,,,,0 2135,2,IVANDRE MONTIEL DA SILVA,* 975660 ,10,0.0,20190403,,,,,0 2135,2,FABIO AUGUSTO CANTIZANI BARBOSA,* 379967 ,10,0.0,20190403,,,,, 0 2135,2,卡洛斯·莫塔·多斯桑托斯,* 876287,10,0.0,20190403,,,,,0 2135,2,CAMILO BUZZI,* 569178 ,10,0.0,20190403,,,,,0 '''
当我尝试使用另一个“empES.csv”文件时,会发生以下情况:
''' j,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj ,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio ,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres'',cod
......永远这样下去。
我不知道为什么第一个在代码中运行良好以及为什么第二个给出该输出,就像 .isin 在这种情况下没有迭代一样!
有什么想法吗?
ps:这里提供的所有数据都是来自巴西政府的公共领域。