需要一个“标尺”来确定要拆分的列:
cat(">",paste0(rep(c(1:9,"+"),14),collapse=""))
cat(">",paste0(sprintf("%08s0/",1:14),collapse=""))
> 123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+
cat(">",paste0(sprintf("%08s0/",1:14),collapse=""))
> 000000010/000000020/000000030/000000040/000000050/000000060/000000070/000000080/000000090/000000100/000000110/000000120/000000130/000000140/
# and paste in the first line of data
> CN0100100000000 01 001 Autauga County, AL 1990 16,875 15,853 1,022 6.1
这使您可以确定在哪里放置拆分,并取差异并移动一个值即可获得宽度。跳过前六行,然后将读入的数据作为字符处理,以避免因素的麻烦。在强制为数字之前删除逗号。
> dat = read.fwf(url,
widths=diff(c(0,16,21,29,80,86,100,115,125,132)+1),
skip=6,colClasses="character")
> str(dat)
'data.frame': 3219 obs. of 9 variables:
$ V1: chr "CN0100100000000 " "CN0100300000000 " "CN0100500000000 " "CN0100700000000 " ...
$ V2: chr " 01 " " 01 " " 01 " " 01 " ...
$ V3: chr " 001 " " 003 " " 005 " " 007 " ...
$ V4: chr " Autauga County, AL " " Baldwin County, AL " " Barbour County, AL " " Bibb County, AL " ...
$ V5: chr " 1990 " " 1990 " " 1990 " " 1990 " ...
$ V6: chr " 16,875 " " 46,773 " " 11,458 " " 7,408 " ...
$ V7: chr " 15,853 " " 44,492 " " 10,619 " " 6,776 " ...
$ V8: chr " 1,022 " " 2,281 " " 839 " " 632 " ...
$ V9: chr " 6.1" " 4.9" " 7.3" " 8.5" ...
dat[6:8] <- lapply( dat[6:8],
function(col) as.numeric( gsub("[,]", "", col)) )
> str(dat)
'data.frame': 3219 obs. of 9 variables:
$ V1: chr "CN0100100000000 " "CN0100300000000 " "CN0100500000000 " "CN0100700000000 " ...
$ V2: chr " 01 " " 01 " " 01 " " 01 " ...
$ V3: chr " 001 " " 003 " " 005 " " 007 " ...
$ V4: chr " Autauga County, AL " " Baldwin County, AL " " Barbour County, AL " " Bibb County, AL " ...
$ V5: chr " 1990 " " 1990 " " 1990 " " 1990 " ...
$ V6: num 16875 46773 11458 7408 19130 ...
$ V7: num 15853 44492 10619 6776 18001 ...
$ V8: num 1022 2281 839 632 1129 ...
$ V9: chr " 6.1" " 4.9" " 7.3" " 8.5" ...
dat[[9]] <- as.numeric( dat[[9]])
这可能会通过使用一些“NULL”来改善