0

我制作了一个代码来选择“公司合作伙伴”文件中的公司编号字段,然后与具有特定国家/地区的公司列表的文件进行比较,将结果写入第三个文件(最终结果:让合作伙伴该州的所有公司)。代码很简单:

import pandas as pd

dsocio = pd.read_csv('D:/CNPJ-full-master/cnpj-csv/socios.csv', chunksize=262144, low_memory=False)
duf = pd.read_csv('D:/pyData/receita/empES.csv', usecols = ['cnpj'], low_memory=False)

for chunk in dsocio:
    result = chunk[chunk['cnpj'].isin(duf.cnpj)]
    result.to_csv('D:/CNPJ-full-master/cnpj-csv/UFs/socioES.csv', index=False, header=True, mode='a')

问题是,我有两个版本的“empES.csv”文件。它们的列数不同,但都将字段“cnpj”作为第一列。这是我唯一需要的字段。当我运行传递版本 1 文件的代码时,它运行完美。但是,当我尝试打开版本 2 时,我的输出文件开始仅填充标题。带有标题的许多行!

以下是第一行的一些片段:

  1. 合作伙伴文件(socios.csv,我将从中复制匹配行的文件):

''' "cnpj","tipo_socio","nome_socio","cnpj_cpf_socio","cod_qualificacao","perc_capital","data_entrada","cod_pais_ext","nome_pais_ext","cpf_repres","nome_repres","cod_qualif_repres"

"00000000000191","2","MARCIO HAMILTON FERREIRA","* 923641 ","10",0.0,"20101117","","","","","00" "00000000000191","2 ","NILSON MARTINIANO MOREIRA","* 491386 ","10",0.0,"20101117","","","","","00" "00000000002135","2","DEBORA CRISTINA FONSECA ","* 314628 ","08",0.0,"20200312","","","","","00" "00000000002216","2","WALDERY RODRIGUES JUNIOR","* 025913 " ,"08",0.0,"20200312","","","","","00" "00000000002216","2","ERIK DA COSTA BREYER","* 093217 ","10",0.0,"20191209","",""," ","","00" "00000000002216","2","汤普森苏亚雷斯佩雷拉塞萨尔","* 503187 ","10",0.0,"20191209","","","","", "00" "00000000002569","2","WALTER MALIENI JUNIOR","* 718468"","","00" "00000000002216","2","汤普森苏亚雷斯佩雷拉塞萨尔","* 503187 ","10",0.0,"20191209","","","","" ,"00" "00000000002569","2","WALTER MALIENI JUNIOR","* 718468"","","00" "00000000002216","2","汤普森苏亚雷斯佩雷拉塞萨尔","* 503187 ","10",0.0,"20191209","","","","" ,"00" "00000000002569","2","WALTER MALIENI JUNIOR","* 718468","10",0.0,"20101117","","","","","00" "00000000002569","2","NILSON MARTINIANO MOREIRA","* 491386 ","10", 0.0,"20101117","","","","","00" "00000000002640","2","WALDERY RODRIGUES JUNIOR","* 025913 ","08",0.0,"20200312", "","","","","00" '''

  1. 工作公司文件 (empES.csv),我从中只读取了“cnpj”字段:

''' cnpj,identificador_matriz_filial,razao_social,nome_fantasia,situacao_cadastral,data_situacao_cadastral,motivo_situacao_cadastral,nome_cidade_exterior,codigo_natureza_juridica,data_inicio_atividade,cnae_fiscal,descricao_tipo_logradouro,logradouro,numero,complemento,bairro,cep,uf,codigo_municipio,municipio,ddd_telefone_1,ddd_telefone_2,ddd_fax,qualificacao_do_responsavel ,capital_social,porte,opcao_pelo_simples,data_opcao_pelo_simples,data_exclusao_do_simples,opcao_pelo_mei,situacao_especial,data_situacao_especial

2135,2,BANCO DO BRASIL SA,VITORIA - ES,2,2005-11-03,0,,2038,1966-08-01,6421200,PRACA,PIO XII,30,,CENTRO,29010340.0,ES,5705, VITORIA,,,,10,0.0,5,0,,,0,, 8338,2,BANCO DO BRASIL SA,CACHOEIRO DE ITAPEMIRIM-ES-EST UNIF,2,2005-11-03,0,,2038,1966 -08-01,6421200,PRACA,JERONIMO MONTEIRO,26,,CENTRO,29300902.0,ES,5623,CACHOEIRO DE ITAPEMIRIM,,,,10,0.0,5,0,,,0,, 11207,2,BANCO DO BRASIL SA,COLATINA-ES-EST.UNIF,2,2005-11-03,0,,2038,1966-08-01,6421200,RUA,EXPED ABILIO DOS SANTOS,124,,CENTRO,29700070.0,ES,5629,COLATINA ,,,,10,0.0,5,0,,,0,, 18643,2,BANCO DO BRASIL SA,,2,2005-11-03,0,,2038,1966-08-01,6421200,RUA, PRESIDENTE VARGAS,29,,CENTRO,29400000.0,ES,5667,MIMOSO DO SUL,,,,10,0.0,5,0,,,0,, 19615,2,BANCO DO BRASIL SA,,2,2005-11- 03,0,,2038,1982-05-04,6421200,AVENIDA,SENADOR EURICO RESENDE,994,,CENTRO,29845000.0,ES,5619,BOA ESPERANCA,,,,10,0.0,5,0,,,0, , 20974,2,BANCO DO BRASIL SA,SANTA TERESA ES-EST UNIF,2,2005-11-03,0,,2038,1966-08-01,6421200,RUA,JERONIMO VERVLOET,178,,CENTRO,29650000.0,ES,5691,SANTA特蕾莎,,,,10,0.0,5,0,,,0,, '''

  1. 新公司文件(empES.csv),这给了我奇怪的行为:

''' cnpj,matriz_filial,razao_social,nome_fantasia,situacao,data_situacao,motivo_situacao,nm_cidade_exterior,cod_pais,nome_pais,cod_nat_juridica,data_inicio_ativ,cnae_fiscal,tipo_logradouro,logradouro,numero,complemento,bairioro,cep_unicipio,cod_m1,cipioro,cep_unicipio,cod_m, ,ddd_2,telefone_2,ddd_fax,num_fax,email,qualif_resp,capital_social,porte,opc_simples,data_opc_simples,data_exc_simples,opc_mei,sit_especial,data_sit_especial

2135,2,BANCO DO BRASIL SA,VITORIA - ES,2,20051103,0,,,,2038,19660801,6421200,PRACA,PIO XII,30,,CENTRO,29010340.0,ES,5705,VITORIA,,,,, ,,AGE0021@BB.COM.BR,10,0.0,5,0,,,,, 8338,2,BANCO DO BRASIL SA,CACHOEIRO DE ITAPEMIRIM-ES-EST UNIF,2,20051103,0,,,,2038 ,19660801,6421200,PRACA,JERONIMO MONTEIRO,26,,CENTRO,29300902.0,ES,5623,CACHOEIRO DE ITAPEMIRIM,,,,,,,,10,0.0,5,0,,,,, 11207,2,BANCO DO BRASIL SA,COLATINA-ES-EST.UNIF,2,20051103,0,,,,2038,19660801,6421200,RUA,EXPED ABILIO DOS SANTOS,124,,CENTRO,29700070.0,ES,5629,COLATINA,,,,, ,,,10,0.0,5,0,,,,, 18643,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,PRESIDENTE VARGAS,29,,CENTRO, 29400000.0,ES,5667,MIMOSO DO SUL,,,,,,,,10,0.0,5,0,,,,,, 19615,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038, 19820504,6421200,AVENIDA,SENADOR EURICO RESENDE,994,,CENTRO,29845000.0,ES,5619,BOA ESPERANCA,,,,,,,,10,0.0,5,0,,,,, 20974,2,BANCO DO BRASIL SA,SANTA TERESA ES-EST UNIF,2,20051103,0,,,,2038,19660801,6421200,RUA,JERONIMO VERVLOET,178,,CENTRO,29650000.0,ES ,5691,SANTA TERESA,,,,,,,,10,0.0,5,0,,,,, 22241,2,BANCO DO BRASIL SA,SAO MATEUS ES EST UNIF,2,20051103,0,,,,2038 ,19660801,6421200,AVENIDA,JONES DOS SANTOS NEVES,324,,CENTRO,29930010.0,ES,5697,SAO MATEUS,,,,,,,,10,0.0,5,0,,,,, 28100,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,AVENIDA,JERONIMO MONTEIRO,38/46,,CENTRO,29500000.0,ES,5603,ALEGRE,,,,,,,,10,0.0 ,5,0,,,,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,瓜崔,,,,,,,,10,0.0,5,0,,,,, '''0.0,5,0,,,,, 22241,2,BANCO DO BRASIL SA,SAO MATEUS ES EST UNIF,2,20051103,0,,,,2038,19660801,6421200,AVENIDA,JONES DOS SANTOS NEVES,324,, CENTRO,29930010.0,ES,5697,SAO MATEUS,,,,,,,,10,0.0,5,0,,,,, 28100,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038 ,19660801,6421200,AVENIDA,JERONIMO MONTEIRO,38/46,,CENTRO,29500000.0,ES,5603,ALEGRE,,,,,,,,,10,0.0,5,0,,,,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,瓜翠,,,,,,,,10,0.0,5, 0,,,,, '''0.0,5,0,,,,, 22241,2,BANCO DO BRASIL SA,SAO MATEUS ES EST UNIF,2,20051103,0,,,,2038,19660801,6421200,AVENIDA,JONES DOS SANTOS NEVES,324,, CENTRO,29930010.0,ES,5697,SAO MATEUS,,,,,,,,10,0.0,5,0,,,,, 28100,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038 ,19660801,6421200,AVENIDA,JERONIMO MONTEIRO,38/46,,CENTRO,29500000.0,ES,5603,ALEGRE,,,,,,,,,10,0.0,5,0,,,,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,瓜翠,,,,,,,,10,0.0,5, 0,,,,, ''',,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,GUACUI,,,,, ,,,10,0.0,5,0,,,,, ''',,, 37001,2,BANCO DO BRASIL SA,,2,20051103,0,,,,2038,19660801,6421200,RUA,DEMERVAL AMARAL,35,,CENTRO,29560000.0,ES,5645,GUACUI,,,,, ,,,10,0.0,5,0,,,,, '''

这是我传递第一个 empES.csv 文件时的输出示例:

''' cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres

2135,2,WALDERY RODRIGUES JUNIOR,* 025913 ,8,0.0,20200312,,,,,0 2135,2,ERIK DA COSTA BREYER,* 093217 ,10,0.0,20191209,,,,,0 2135,2,THOMPSON SOARES PEREIRA CESAR,* 503187 ,10,0.0,20191209,,,,,0 2135,2,MAURICIO NOGUEIRA,* 894537 ,10,0.0,20191209,,,,,0 2135,2,DANIEL ANDRE STIELER,* 145110 , 10,0.0,20190910,,,,,0 2135,2,ENIO MATHIAS FERREIRA,* 078106 ,10,0.0,20181107,,,,,0 2135,2,RONALDO SIMON FERREIRA,* 685018 ,10,0.0,20190729, ,,,,0 2135,2,IVANDRE MONTIEL DA SILVA,* 975660 ,10,0.0,20190403,,,,,0 2135,2,FABIO AUGUSTO CANTIZANI BARBOSA,* 379967 ,10,0.0,20190403,,,,, 0 2135,2,卡洛斯·莫塔·多斯桑托斯,* 876287,10,0.0,20190403,,,,,0 2135,2,CAMILO BUZZI,* 569178 ,10,0.0,20190403,,,,,0 '''

当我尝试使用另一个“empES.csv”文件时,会发生以下情况:

''' j,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj ,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio ,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres,cod_qualif_repres cnpj,tipo_socio,nome_socio,cnpj_cpf_socio,cod_qualificacao,perc_capital,data_entrada,cod_pais_ext,nome_pais_ext,cpf_repres,nome_repres'',cod

......永远这样下去。

我不知道为什么第一个在代码中运行良好以及为什么第二个给出该输出,就像 .isin 在这种情况下没有迭代一样!

有什么想法吗?

ps:这里提供的所有数据都是来自巴西政府的公共领域。

4

1 回答 1

0

好吧,最后它是一个值错误的列。基本上我导出了一个只有“cnpj”列的文件:

import pandas as pd
duf = pd.read_csv('D:/CNPJ-full-master/cnpj-csv/UFs/empES.csv', usecols = ['cnpj'], low_memory=False)
duf.to_csv('D:/CNPJ-full-master/cnpj-csv/UFs/empES-cnpj.csv', index=False)`

然后我用记事本++查看了它。我看到中间有一列又是“cnpj”,而不是一个值。然后我寻找它,发现还有 200 行用相同的 'cnpj' 代替值。好吧,在大约 900.000 行中,200 行并不多,所以我只是删除了它们,它终于可以工作了。无论如何,虽然问题已解决,但我不知道为什么非数值会以这种方式崩溃代码。必须与字符串值与列名相同的事实有关。

于 2020-07-03T03:09:16.160 回答