1

所以正如你所看到的,我在下面有一个价格和日期列

 Price  Day
    2   1
    5   2
    8   3
    11  4
    14  5
    17  6
    20  7
    23  8
    26  9
    29  10
    32  11
    35  12
    38  13
    41  14
    44  15
    47  16
    50  17
    53  18
    56  19
    59  20

然后我想要下面的输出

  Difference    Day
    12  5
    15  10
    15  15
    15  20

所以现在我每 5 天有一次价格差异......它基本上只是减去第 5 天和第 1 天......然后第 10 天和第 5 天等等......我已经做了一个代码将我的数据分成 5 天的间隔......但我想要的代码可以让我在第 1 天减去第 5 天......第 10 天和第 5 天......等等所以代码应该看起来像这样

difference<-tapply(Price[,1],Day, ____________)

所以基本上 Price[,1] 将是我的价格数据.....而“Day”是我创建的变量,它可以让我将 Day 数据分成 5 天的间隔......我在想我可以放入函数或其他变量的空白部分,让我用第 1 天的价格减去第 5 天,然后再减去第 10 天和第 5 天的价格......等等......你不必帮我将我的日子分成间隔......只是如何做“差异”部分......谢谢大家

4

3 回答 3

5

Here's one option, assuming your data.frame is called "SODF":

within(SODF[c(1, seq(5, nrow(SODF), 5)), ], { 
  Price <- diff(c(0, Price)) 
})[-1, ]
#    Price Day
# 5     12   5
# 10    15  10
# 15    15  15
# 20    15  20

The first step is basic subsetting. According to your description and expected answer, you want the first row, and then every fifth row starting from row 5:

> SODF[c(1, seq(5, nrow(SODF), 5)), ]
   Price Day
1      2   1
5     14   5
10    29  10
15    44  15
20    59  20

From there, you can use diff on the "Price" column, but since diff will result in a vector that is one in length shorter than your input, you need to "pad" the input vector, which I did with diff(c(0, Price)).

# Correct values, but the number of rows needs to be 5
> diff(SODF[c(1, seq(5, nrow(SODF), 5)), "Price"])
[1] 12 15 15 15

Then, the [-1, ] at the end just deletes the extraneous row.

Update

In the comments below, @geektrader points out in the comments (thanks!), an alternative to using:

SODF[c(1, seq(5, nrow(SODF), 5)), ]

as your input data.frame, you may consider using the following instead:

rbind(SODF[1,], SODF[$Day %% 5 == 0,] )

The difference in the two approaches is that the first approach simply subsets by row number, while the second approach subsets according to the value in the "Day" column, extracting rows where "Day" is a multiple of 5. This second approach might be useful, for instance, when there are missing rows in the dataset.

于 2013-03-08T03:53:22.017 回答
1

Ananda's 是一个很好的方法(总是忘记我自己)。这是另一种方法:

dat2 <- dat[seq(0, nrow(dat), by=5), ]
data.frame(Difference=diff(c(dat[1,1], dat2[, 1])), Day=dat2[, 2])
于 2013-03-08T03:59:47.837 回答
0

如果您有一个矩阵作为输入,这是一个解决方案。

随后的函数,给定一个矩阵m,一列col_id和一个数字区间interv,用前一个值减去每interv行矩阵col_id列中的当前值m(显然是之前的 5 行,相同的列)。

结果存储在一个名为diff并附加到m矩阵末尾的新列中。

简而言之,该方法与@Ananda Mahto 使用的方法非常相似。

所以,这是功能:

subtract_column <- function(m, col_id, interv) {
  select <- c(1, seq(interv, nrow(m), interv))
  cbind(m[select[-1], ], diff = diff(m[select, col_id]))
}

例子:

# this emulates your data as a matrix
price_vect <- c(2,5,8,11,14,17,20,23,26,29,32,35,38,41,44,47,50,53,56,59)
day_vect <- 1:20
matr <- do.call(cbind, list(price = price_vect, day = day_vect))
# and this calls the function above and does the job:
# subtracts every 5 rows the current and the previous (5 rows back) value in the column `price` of matrix `matr`
subtract_column(matr, 'price', 5)

输出:

     price day diff
[1,]    14   5   12
[2,]    29  10   15
[3,]    44  15   15
[4,]    59  20   15
于 2013-03-08T04:15:56.223 回答