0

我正在尝试使用以下代码导入一些公开的生活结果数据:

require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls", 
                        sheet = "Quick Lookup", verbose = TRUE)

自然,导入的数据框看起来不太好: 在此处输入图像描述 我想使用以下代码修改列名:

# Clean column names
names(simd.sg.xls) <- make.names(names = as.character(simd.sg.xls[1,]),
                                    unique = TRUE,allow_ = TRUE)

但它会产生相当不愉快的结果:

> names(simd.sg.xls)
 [1] "X1"       "X1.1"     "X771"     "X354"     "X229"     "X74"      "X67"      "X33"      "X19"      "X1.2"    
[11] "X6"       "X1.3"     "X8"       "X7"       "X7.1"     "X6506"    "X21"      "X1.4"     "X6158"    "X6506.1" 
[21] "X6506.2"  "X6506.3"  "X6263"    "X6506.4"  "X6468"    "X1010"    "X815"     "X99"      "X58"      "X65"     
[31] "X60"      "X6506.5"  "X21.1"    "X1.5"     "X6173"    "X5842"    "X6506.6"  "X6506.7"  "X6263.1"  "X6506.8" 
[41] "X6481"    "X883"     "X728"     "X112"     "X69"      "X56"      "X54"      "X6506.9"  "X21.2"    "X1.6"    
[51] "X6143"    "X5651"    "X6506.10" "X6506.11" "X6263.2"  "X6506.12" "X6480"    "X777"     "X647"     "X434"    
[61] "X518"     "X246"     "X436"     "X6506.13" "X21.3"    "X1.7"     "X6136"    "X5677"    "X6506.14" "X6506.15"
[71] "X6263.3"  "X6506.16" "X660"     "X567"     "X480"     "X557"     "X261"     "X456"  

我的问题是,是否有办法巧妙地将第一行的值强制转换为列名?由于我正在处理大量数据,因此我正在寻找易于重现的解决方案,我可以对实际字符串进行大量违反以获得语法正确的名称,但理想情况下,我会避免使用复杂的正则表达式,因为我我经常阅读此处链接的文件,并且不想被迫调整每个导入的规则。

4

1 回答 1

1

It looks like the problem is that the header is on the second line, not the first. You could include a skip=1 argument but a more general way of dealing with this using read.xls seems to be to use the pattern and header arguments which force the first line which matches the pattern string to be treated as the header. Your code becomes:

require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls", 
                        sheet = "Quick Lookup", verbose = TRUE, 
                        pattern="DATAZONE", header=TRUE)

UPDATE

I don't get the warning messages you do when I execute the code. The messages refer to an issue with locale. The locale settings on my system are:

Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Yours are probably different. Locale data could be OS dependent. I'm using Windows 8.1. Also I'm using Strawberry Perl; you appear to be using something else. So some possible reasons for the discrepancy in warning messages but nothing more specific.

On the second question in your comment, to read the entire file, and convert a particular row ( in this case, row 2) to column names, you could use the following code:

simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls", 
                        sheet = "Quick Lookup", verbose = TRUE, 
                        header=FALSE, stringsAsFactors=FALSE)

   names(simd.sg.xls) <- make.names(names = simd.sg.xls[2,],
                                   unique = TRUE,allow_ = TRUE)
   simd.sg.xls <- simd.sg.xls[-(1:2),]

All data will be of character type so you'll need to convert to factor and numeric as necessary.

于 2015-04-19T13:57:08.423 回答