0

我有一个来自美国人口普查的固定宽度文件。它是 zip 中名为“orgeo2010.sf1”的那个。zip是一个大文件。我想将该文件读入 PostgreSQL 12.1 中的表中。这就是我建表的方式。

create table census_2010.geo_header_sf1
 (
    fileid          varchar(6), 
    stusab          varchar(2), 
    sumlev          varchar(3), 
    geocomp         varchar(2), 
    chariter            varchar(3), 
    cifsn           varchar(2), 
    logrecno            integer PRIMARY KEY, 
    region          varchar(1), 
    division            varchar(1), 
    state           varchar(2), 
    county          varchar(3), 
    countycc            varchar(2), 
    countysc            varchar(2), 
    cousub          varchar(5), 
    cousubcc            varchar(2), 
    cousubsc            varchar(2), 
    place           varchar(5), 
    placecc         varchar(2), 
    placesc         varchar(2), 
    tract           varchar(6), 
    blkgrp          varchar(1), 
    block           varchar(4), 
    iuc         varchar(2), 
    concit          varchar(5), 
    concitcc            varchar(2), 
    concitsc            varchar(2), 
    aianhh          varchar(4), 
    aianhhfp            varchar(5), 
    aianhhcc            varchar(2), 
    aihhtli         varchar(1), 
    aitsce          varchar(3), 
    aits            varchar(5), 
    aitscc          varchar(2), 
    ttract          varchar(6), 
    tblkgrp         varchar(1), 
    anrc            varchar(5), 
    anrccc          varchar(2), 
    cbsa            varchar(5), 
    cbsasc          varchar(2), 
    metdiv          varchar(5), 
    csa         varchar(3), 
    necta           varchar(5), 
    nectasc         varchar(2), 
    nectadiv            varchar(5), 
    cnecta          varchar(3), 
    cbsapci         varchar(1), 
    nectapci            varchar(1), 
    ua          varchar(5), 
    uasc            varchar(2), 
    uatype          varchar(1), 
    ur          varchar(1), 
    cd          varchar(2), 
    sldu            varchar(3), 
    sldl            varchar(3), 
    vtd         varchar(6), 
    vtdi            varchar(1), 
    reserve2            varchar(3), 
    zcta5           varchar(5), 
    submcd          varchar(5), 
    submcdcc            varchar(2), 
    sdelem          varchar(5), 
    sdsec           varchar(5), 
    sduni           varchar(5), 
    arealand            integer, 
    areawatr            integer, 
    name            varchar(90), 
    funcstat            varchar(1), 
    gcuni           varchar(2), 
    pop100          integer, 
    hu100           integer, 
    intptlat            varchar(11), 
    intptlon            varchar(12), 
    lsadc           varchar(2), 
    partflag            varchar(1), 
    reserve3            varchar(6), 
    uga         varchar(5), 
    statens         varchar(8), 
    countyns            varchar(8), 
    cousubns            varchar(8), 
    placens         varchar(8), 
    concitns            varchar(8), 
    aianhhns            varchar(8), 
    aitsns          varchar(8), 
    anrcns          varchar(8), 
    submcdns            varchar(8), 
    cd113           varchar(2), 
    cd114           varchar(2), 
    cd115           varchar(2), 
    sldu2           varchar(3), 
    sldu3           varchar(3), 
    sldu4           varchar(3), 
    sldl2           varchar(3), 
    sldl3           varchar(3), 
    sldl4           varchar(3), 
    aianhhsc            varchar(2), 
    csasc           varchar(2), 
    cnectasc            varchar(2), 
    memi            varchar(1), 
    nmemi           varchar(1), 
    puma            varchar(5), 
    reserved            varchar(18)
);

我试图直接从文件中读取:

Decennial_2010=# COPY census_2010.geo_header_sf1
Decennial_2010-# FROM 'D:\projects_and_data\data\PostgreSQL\data\data\or2010.sf1\orgeo2010.sf1';
ERROR:  value too long for type character varying(6)
CONTEXT:  COPY geo_header_sf1, line 1, column fileid: "SF1ST OR04000000  00000014941      

当这不起作用时,我想也许我可以将它导入 R(我知道足够的 R 来操作 R)并编辑它并编写一个新的 FWF 文件。我尝试使用我的新文件并得到相同的结果。

Decennial_2010=# COPY census_2010.geo_header_sf1
Decennial_2010-# FROM 'D:/projects_and_data/data/PostgreSQL/data/data/geo_a' CSV HEADER;
ERROR:  value too long for type character varying(6)
CONTEXT:  COPY geo_header_sf1, line 2, column fileid: "SF1ST OR 40000     14941  

这是在 R 中保存文件的数据框的第一行,write.fwf()用于写入不起作用的新文件。我知道我应该使用dput(td[1,]),但是它会打印所有行(200,000+)中所有因子的所有级别,然后输出dput()甚至不适合控制台的所有可打印空间。因此,我将复制并粘贴默认情况下显示的行。对不起。

td[1,]
  fileid stusab sumlev geocomp chariter cifsn logrecno region division state
1 SF1ST      OR     40      00        0    NA        1      4        9    41
  county countycc countysc cousub cousubcc cousubsc place placecc placesc tract
1     NA                NA     NA                NA    NA              NA    NA
  blkgrp block iuc concit concitcc concitsc aianhh aianhhfp aianhhcc aihhtli
1     NA    NA  NA     NA       NA       NA     NA       NA                 
  aitsce aits aitscc ttract tblkgrp anrc anrccc cbsa cbsac metdiv csa necta
1     NA   NA     NA                  NA     NA   NA    NA     NA  NA    NA
  nectasc nectadiv cnecta cbsapci nectapci ua uasc uatype ur cd sldu sldl vtd
1      NA       NA     NA                  NA   NA     NA NA NA   NA   NA  NA
  vtdi reserve2 zcta5 submcd submcdcc sdelem sdsec sduni     arealand
1            NA    NA     NA       NA     NA    NA    NA 248607802255
    areawatr
1 6191433228
                                                                                        name
1 Oregon                                                                                    
  funcstat gcuni  pop100   hu100 intptlat  intptlon lsadc partflag reserve3 uga
1        A     ! 3831074 1675562 43.97152 -120.6226    00                NA  NA
  statens countyns cousubns placens concitns aianhhns aitsns anrcns submcdns
1 1155107       NA       NA      NA       NA       NA     NA     NA       NA
  cd113 cd114 cd115 sldu2 sldu3 sldu4 sldl2 sldl3 sldl4 aianhhsc csasc cnectasc
1    NA    NA    NA    NA    NA    NA    NA    NA    NA       NA    NA       NA
  memi nmemi puma reserved
1   NA    NA   NA       NA


library(gdata)
write.fwf(td, "D:/projects_and_data/data/PostgreSQL/data/data/geo_a", sep="")

需要在表或起始文件中更改哪些文件才能将文件复制到 PostgreSQL 中的表中?

4

0 回答 0