r - 如何将文件的多行读入数据框的一行

Question

我有一个数据文件，其中各个样本由空行分隔，每个字段都在自己的行上：

age 20
weight 185
height 72

age 87
weight 109
height 60

age 15
weight 109
height 58

...

如何将此文件读入数据框中，以便每一行代表一个包含年龄、体重、身高列的样本？

    age    weight    height

1   20      185        72  
2   87      109        60
3   15      109        58
...

score 4 · Accepted Answer

@user1317221_G 展示了我将采用的方法，但求助于加载额外的包并显式生成组。组（ID 变量）是使任何reshape类型答案起作用的关键。矩阵答案没有这个限制。

这是基本 R 中密切相关的方法：

mydf <- read.table(header = FALSE, stringsAsFactors=FALSE, 
                   text = "age 20
                   weight 185
                   height 72

                   age 87
                   weight 109
                   height 60

                   age 15
                   weight 109
                   height 58
                   ")

# Create your id variable
mydf <- within(mydf, {
  id <- ave(V1, V1, FUN = seq_along)
})

使用 id 变量，您的转换很容易：

reshape(mydf, direction = "wide", 
        idvar = "id", timevar="V1")
#   id V2.age V2.weight V2.height
# 1  1     20       185        72
# 4  2     87       109        60
# 7  3     15       109        58

或者：

# Your ids become the "rownames" with this approach
as.data.frame.matrix(xtabs(V2 ~ id + V1, mydf))
#   age height weight
# 1  20     72    185
# 2  87     60    109
# 3  15     58    109

score 2 · Accepted Answer

要扩展@BlueMagister 的答案，您可以使用带有一些选项的扫描将其直接读入列表，然后将列表转换为数据框：

tmp <- scan(text = "
age     20
weight  185
height  72

age     87
weight  109
height  60

age     15
weight  109
height  58", multi.line=TRUE, 
  what=list('',0,'',0,'',0), 
  blank.lines.skip=TRUE)

mydf <- as.data.frame( tmp[ c(FALSE,TRUE) ] )
names(mydf) <- sapply( tmp[ c(TRUE,FALSE) ], '[', 1 )

这假设记录中的变量始终处于相同的顺序。

score 1 · Accepted Answer

df <- read.table(text ="
age     1
weight  1
height  6

age     2
weight  7
height  2

age     4
weight  8
height  9", header=FALSE) 

df$ID <- rep(1:3, each=3)
library(reshape2)
newdf <- dcast(df, ID~V1, value.var="V2")

#     ID age height weight
#1  1   1      6      1
#2  2   2      2      7
#3  3   4      9      8

score 1 · Accepted Answer

其他解决方案

data <- readLines('c:\\relatorios\\bla.txt') # Read the data
data <- data[data != ''] # Remove the white lines
names <- unique(gsub('[0-9]*','',data)) # Get the names
data <- matrix(as.real(gsub('[^0-9]*','',data)),ncol=3,byrow=T) # Create matrix
colnames(data) <- names # Set the names

score 1 · Accepted Answer

这是我尝试过的东西scan：

##substitute text with file depending on your input
##read in three strings separated by spaces, multi-line input
y <- scan(text=x,what=list(character(),character(),character())
  ,sep="\n",multi.line=TRUE)
##combine into a matrix of strings
y <- do.call(cbind,y)
#     [,1]     [,2]         [,3]       
#[1,] "age 20" "weight 185" "height 72"
#[2,] "age 87" "weight 109" "height 60"
#[3,] "age 15" "weight 109" "height 58"
##set column names based on text from the first row
colnames(y) <- regmatches(y[1,],regexpr("^\\w+",y[1,]))
##remove non-numeric characters
y <- gsub("\\D+","",y)
##convert to number format, preserving matrix structure
y <- apply(y,2,as.numeric)
##convert to data frame (if necessary)
y <- data.frame(y)

score 0 · Accepted Answer

如果您的源文件一直包含这三个变量，一种简单的方法是将文件作为两个列读取（第一列是名称，第二列是数字），然后将第二列转换为矩阵。如果我df从 user1317221_G 的回答中告发，

matrix(df$V2,ncol=3,byrow=TRUE)
     [,1] [,2] [,3]
[1,]    1    1    6
[2,]    2    7    2
[3,]    4    8    9

添加行和/或列名称是微不足道的。很抱歉获得列顺序“年龄，体重，身高”:-)

r - 如何将文件的多行读入数据框的一行

6 回答 6

Related

Reference