r - 在 R 的数据框中插入“空”行（填充）

Question

问题已解决，在帖子底部添加了解决方案！

我想知道如何通过在现有行之间插入行（不附加到末尾）来“填充”数据框。

我的情况如下：

我有一个包含大约 1700 个案例和 650 个变量的数据集
某些变量可能的答案类别从 0 到 100（问题是：“多少百分比...”-> 人们可以填写从 0 到 100）
现在我想在 geom_area() 中显示其中一个变量（我们称之为var ）的分布。

问题：

1) 我需要一个范围从 0 到 100 的 X 轴

2) 没有选择var中所有可能的百分比值，例如我有 30 倍的答案“20%”，但没有答案“19%”。对于 x 轴，这意味着 x 位置 19 处的 y 值为“0”，x 位置 20 处的 y 值为“30”。

为了准备我的数据（这个变量）以使用 ggplot 绘制它，我通过 table 函数对其进行了转换：

dummy <- as.data.frame(table(var))

现在我有一个包含答案类别的列“Var1”和一个包含每个答案类别计数的列“Freq”。

我总共有 57 行，这意味着没有说明 44 个可能的答案（值从 0% 到 100%）。

示例（我的数据框），“Var1”包含给定的答案，“Freq”计数：

     Var1 Freq
1     0    1
2     1   16
3     2   32
4     3   44
5     4   14
...
15   14    1
16   15  169 # <-- See next row and look at "Var1"
17   17    2 # <-- "16%" was never given as answer

现在我的问题是：如何创建一个新的数据框，在第 16 行之后插入一行（“Var1”=15），我可以将“Var1”设置为 16，将“Freq”设置为 0？

     Var1 Freq
...
15   14    1
16   15  169
17   16    0 # <-- This line I like to insert
18   17    2

我已经尝试过这样的事情：

dummy_x <- NULL
dummy_y <- NULL

for (k in 0:100) {
  pos <- which(dummy$Var1==k)
  if (!is.null(pos)) {
    dummy_x <- rbind(dummy_x, c(k))
    dummy_y <- rbind(dummy_y, dummy$Freq[pos])
  }
  else {
    dummy_x <- rbind(dummy_x, c(k))
    dummy_y <- rbind(dummy_y, 0)
  }
}

newdataframe <- data.frame(cbind(dummy_x), cbind(dummy_y))

这导致 dummy_x 有 101 个值（从 0 到 101，正确）的错误，但 dummy_y 只包含 56 行？

结果应该是这样绘制的：

plot(ggplot(newdataframe, aes(x=Var1, y=Freq)) +
   geom_area(fill=barcolors, alpha=0.3) +
   geom_line() +
   labs(title=fragetitel, x=NULL, y=NULL))

在此先感谢，丹尼尔

这个问题的解决方案

plotFreq <- function(var, ftitle=NULL, fcolor="blue") {
# create data frame from frequency table of var
# to get answer categorie and counts in separate columns
dummyf <- as.data.frame(table(var))
# rename to "x-axis" and "y-axis"
names(dummyf) <- c("xa", "ya")
# transform $xa from factor to numeric
dummyf$xa <- as.numeric(as.character(dummyf$xa))
# get maximum x-value for graph
maxval <- max(dummyf$xa)
# Create a vector of zeros 
frq <- rep(0,maxval)
# Replace the values in freq for those indices which equal dummyf$xa
# by dummyf$ya so that remaining indices are ones which you 
# intended to insert 
frq[dummyf$xa] <- dummyf$ya
# create new data frame
newdf <- as.data.frame(cbind(var = 1:maxval, frq))
# print plot
ggplot(newdf, aes(x=var, y=frq)) +
  # fill area
  geom_area(fill=fcolor, alpha=0.3) +
  # outline
  geom_line() +
  # no additional labels on x- and y-axis
  labs(title=ftitle, x=NULL, y=NULL)
}

score 3 · Accepted Answer

尝试这样的事情

 insertRowToDF<-function(X,index_after,vector_to_insert){
      stopifnot(length(vector_to_insert) == ncol(X)); # to check valid row to be inserted
      X<-rbind(X[1:index_after,],vector_to_insert,X[(index_after+1):nrow(X),]);
      row.names(X)<-1:nrow(X);
      return (X);
 }

你可以用

df<-insertRowToDF(df,16,c(16,0)); # inserting the values (16,0) after the 16th row

score 3 · Accepted Answer

我认为这是更简单的解决方案。循环不是必需的。想法是创建一个所需结果大小的向量，所有值都设置为零，然后用频率表中的非零值替换适当的值。

> #Let's create sample data
> set.seed(12345)
> var <- sample(100, replace=TRUE)
> 
> 
> #Lets create frequency table
> x <- as.data.frame(table(var))
> x$var <- as.numeric(as.character(x$var))
> head(x)
  var Freq
1   1    3
2   2    1
3   4    1
4   5    2
5   6    1
6   7    2
> #Create a vector of 0s 
> freq <- rep(0, 100)
> #Replace the values in freq for those indices which equal x$var  by x$Freq so that remaining 
> #indices are ones which you intended to insert 
> freq[x$var] <- x$Freq
> head(freq)
[1] 3 1 0 1 2 1
> #cbind data together 
> freqdf <- as.data.frame(cbind(var = 1:100, freq))
> head(freqdf)
  var freq
1   1    3
2   2    1
3   3    0
4   4    1
5   5    2
6   6    1

score 2 · Accepted Answer

这是 Aditya 的代码加上一些处理特殊情况的条件：

insertRowToDF<-function(X,index_after,vector_to_insert){
  stopifnot(length(vector_to_insert) == ncol(X)); # to check valid row to be inserted
  if (index_after != 0) {
  if (dim(X)[1] != index_after) {
  X <- rbind(X[1:index_after,], vector_to_insert, X[(index_after+1):nrow(X),]);
  } else {
  X <- rbind(X[1:index_after,], vector_to_insert);
  }
  } else {
  if (dim(X)[1] != index_after) {
  X <- rbind(vector_to_insert, X[(1):nrow(X),]);
  } else { 
  X <- rbind(vector_to_insert);
  }
  }
  row.names(X)<-1:nrow(X);
  return (X);
 }

r - 在 R 的数据框中插入“空”行（填充）

3 回答 3

Related

Reference