我有一个来自美国人口普查的固定宽度文件。它是 zip 中名为“orgeo2010.sf1”的那个。zip是一个大文件。我想将该文件读入 PostgreSQL 12.1 中的表中。这就是我建表的方式。
create table census_2010.geo_header_sf1
(
fileid varchar(6),
stusab varchar(2),
sumlev varchar(3),
geocomp varchar(2),
chariter varchar(3),
cifsn varchar(2),
logrecno integer PRIMARY KEY,
region varchar(1),
division varchar(1),
state varchar(2),
county varchar(3),
countycc varchar(2),
countysc varchar(2),
cousub varchar(5),
cousubcc varchar(2),
cousubsc varchar(2),
place varchar(5),
placecc varchar(2),
placesc varchar(2),
tract varchar(6),
blkgrp varchar(1),
block varchar(4),
iuc varchar(2),
concit varchar(5),
concitcc varchar(2),
concitsc varchar(2),
aianhh varchar(4),
aianhhfp varchar(5),
aianhhcc varchar(2),
aihhtli varchar(1),
aitsce varchar(3),
aits varchar(5),
aitscc varchar(2),
ttract varchar(6),
tblkgrp varchar(1),
anrc varchar(5),
anrccc varchar(2),
cbsa varchar(5),
cbsasc varchar(2),
metdiv varchar(5),
csa varchar(3),
necta varchar(5),
nectasc varchar(2),
nectadiv varchar(5),
cnecta varchar(3),
cbsapci varchar(1),
nectapci varchar(1),
ua varchar(5),
uasc varchar(2),
uatype varchar(1),
ur varchar(1),
cd varchar(2),
sldu varchar(3),
sldl varchar(3),
vtd varchar(6),
vtdi varchar(1),
reserve2 varchar(3),
zcta5 varchar(5),
submcd varchar(5),
submcdcc varchar(2),
sdelem varchar(5),
sdsec varchar(5),
sduni varchar(5),
arealand integer,
areawatr integer,
name varchar(90),
funcstat varchar(1),
gcuni varchar(2),
pop100 integer,
hu100 integer,
intptlat varchar(11),
intptlon varchar(12),
lsadc varchar(2),
partflag varchar(1),
reserve3 varchar(6),
uga varchar(5),
statens varchar(8),
countyns varchar(8),
cousubns varchar(8),
placens varchar(8),
concitns varchar(8),
aianhhns varchar(8),
aitsns varchar(8),
anrcns varchar(8),
submcdns varchar(8),
cd113 varchar(2),
cd114 varchar(2),
cd115 varchar(2),
sldu2 varchar(3),
sldu3 varchar(3),
sldu4 varchar(3),
sldl2 varchar(3),
sldl3 varchar(3),
sldl4 varchar(3),
aianhhsc varchar(2),
csasc varchar(2),
cnectasc varchar(2),
memi varchar(1),
nmemi varchar(1),
puma varchar(5),
reserved varchar(18)
);
我试图直接从文件中读取:
Decennial_2010=# COPY census_2010.geo_header_sf1
Decennial_2010-# FROM 'D:\projects_and_data\data\PostgreSQL\data\data\or2010.sf1\orgeo2010.sf1';
ERROR: value too long for type character varying(6)
CONTEXT: COPY geo_header_sf1, line 1, column fileid: "SF1ST OR04000000 00000014941
当这不起作用时,我想也许我可以将它导入 R(我知道足够的 R 来操作 R)并编辑它并编写一个新的 FWF 文件。我尝试使用我的新文件并得到相同的结果。
Decennial_2010=# COPY census_2010.geo_header_sf1
Decennial_2010-# FROM 'D:/projects_and_data/data/PostgreSQL/data/data/geo_a' CSV HEADER;
ERROR: value too long for type character varying(6)
CONTEXT: COPY geo_header_sf1, line 2, column fileid: "SF1ST OR 40000 14941
这是在 R 中保存文件的数据框的第一行,write.fwf()
用于写入不起作用的新文件。我知道我应该使用dput(td[1,])
,但是它会打印所有行(200,000+)中所有因子的所有级别,然后输出dput()
甚至不适合控制台的所有可打印空间。因此,我将复制并粘贴默认情况下显示的行。对不起。
td[1,]
fileid stusab sumlev geocomp chariter cifsn logrecno region division state
1 SF1ST OR 40 00 0 NA 1 4 9 41
county countycc countysc cousub cousubcc cousubsc place placecc placesc tract
1 NA NA NA NA NA NA NA
blkgrp block iuc concit concitcc concitsc aianhh aianhhfp aianhhcc aihhtli
1 NA NA NA NA NA NA NA NA
aitsce aits aitscc ttract tblkgrp anrc anrccc cbsa cbsac metdiv csa necta
1 NA NA NA NA NA NA NA NA NA NA
nectasc nectadiv cnecta cbsapci nectapci ua uasc uatype ur cd sldu sldl vtd
1 NA NA NA NA NA NA NA NA NA NA NA
vtdi reserve2 zcta5 submcd submcdcc sdelem sdsec sduni arealand
1 NA NA NA NA NA NA NA 248607802255
areawatr
1 6191433228
name
1 Oregon
funcstat gcuni pop100 hu100 intptlat intptlon lsadc partflag reserve3 uga
1 A ! 3831074 1675562 43.97152 -120.6226 00 NA NA
statens countyns cousubns placens concitns aianhhns aitsns anrcns submcdns
1 1155107 NA NA NA NA NA NA NA NA
cd113 cd114 cd115 sldu2 sldu3 sldu4 sldl2 sldl3 sldl4 aianhhsc csasc cnectasc
1 NA NA NA NA NA NA NA NA NA NA NA NA
memi nmemi puma reserved
1 NA NA NA NA
library(gdata)
write.fwf(td, "D:/projects_and_data/data/PostgreSQL/data/data/geo_a", sep="")
需要在表或起始文件中更改哪些文件才能将文件复制到 PostgreSQL 中的表中?