2

我正在尝试导入以下文本文件:

   "year"   "sex"   "name"       "n"    "prop"
"1" 1880    "F"     "Mary"      7065    0.0723835869064085
"2" 1880    "F"     "Anna"      2604    0.0266789611187951
"3" 1880    "F"     "Emma"      2003    0.0205214896777829
"4" 1880    "F"     "Elizabeth" 1939    0.0198657855642641
"5" 1880    "F"     "Minnie"    1746    0.0178884278469341
"6" 1880    "F"     "Margaret"  1578    0.0161672045489473
"7" 1880    "F"     "Ida"       1472    0.0150811946109318
"8" 1880    "F"     "Alice"     1414    0.0144869627580554
"9" 1880    "F"     "Bertha"    1320    0.0135238973413247
"10"1880    "F"     "Sarah"     1288    0.0131960452845653

我没有任何问题使用:

data <-read.table("~/Documents/baby_names.txt",header=TRUE,se="\t")

但是,我还没有弄清楚如何使用 readr 来做到这一点。以下命令失败:

data2 <-read_tsv("~/Documents/baby_names.txt")

我知道问题与第一行包含五个元素(标题)和其余 6 个的事实有关,但我不知道如何告诉读者忽略“1”、“2”、“3”等上。有什么建议么?

4

2 回答 2

1

我们可以分两步阅读(未测试):

# read the columns, convert to character vector
myNames <- read_tsv(file = "myFile.tsv", n_max = 1)[1, ]

# read the data, skip 1st row, then drop the 1st column
myData <- read_tsv(file = "myFile.tsv", skip = 1, col_names = FALSE)[, -1]

# assign column names
colnames(myData) <- myNames
于 2016-06-14T07:48:33.630 回答
0

您可以分别阅读正文和列名,然后将它们组合起来:

require(readr)

df <- read_tsv("baby_names.txt", col_names = F, skip = 1)

col_names <- read.table("baby_names.txt", header = F, sep = "\t", nrows = 1)

df$X1 <- NULL
names(df) <- col_names

结果:

> head(df)
     1     1         1    1          1
1 1880 FALSE      Mary 7065 0.07238359
2 1880 FALSE      Anna 2604 0.02667896
3 1880 FALSE      Emma 2003 0.02052149
4 1880 FALSE Elizabeth 1939 0.01986579
5 1880 FALSE    Minnie 1746 0.01788843
6 1880 FALSE  Margaret 1578 0.01616720

我认为没有一种简单的方法可以read_tsv()像 with 那样设置 row_names read.table(),但这应该是足够的解决方法。

于 2016-06-14T07:51:05.337 回答