r - 基于条件的数据帧的 Dplyr 滚动更新

Question

说我有一个数据框

  stim1  stim2    choice  outcome   Feedback
1     2     1      0       0           1
2     3     2      1       1           1
3     2     3      1       0           1
4     2     3      0       1           1

我的目标是在每一行更新 stim1 和 stim2，即之前选择刺激的累积平均结果。

choice=0 -> stim1 was chosen. 
choice=1 -> stim2 was chosen. 


As an algorithm:
a) For stim=2, find all previous trials where (stim1=2 & choice=0) | (stim2=2 & choce=1)   
b) calculate the mean outcome over all such choices  

For example, at trial 4 the observed outcomes for stim1 (i.e. for 2) is 
    In trial 1 it was chosen (choice=0) and outcome=0
    In trial 2 it was chosen (choice=1) and outcome=1
    In trial 3,it was not chosen (choice=1) so its not included in the count 
    So the observed outcomes is 1/2

期望的结果

  stim1  stim2 choice  outcome Feedback    Observed_Stim1   Observed_Stim2
1     2     1      0       0     1            NaN              NaN
2     3     2      1       1     1            NaN               0
3     2     3      1       0     1            1/2              NaN
4     2     3      1       1     1            1/2               0

我正在尝试做的低效循环版本是

data$trial=1:NROW(data)
data$relative_stim1=rep(NaN, nrow(data))
data$relative_stim2=rep(NaN, nrow(data))
for (i in 2:nrow(data)){
      
      data$relative_stim1[i]=mean(data$outcome[which((data$stim1==data$stim1[i]&data$choice==0&data$feedback==1& data$trial<data$trial[i]) | (data$stim2==data$stim1[i]&data$choice==1&data$feedback==1& data$trial<data$trial[i]))])
      data$relative_stim2[i]=mean(data$outcome[which((data$stim1==data$stim2[i]&data$choice==0&data$feedback==1& data$trial<data$trial[i]) | (data$stim2==data$stim2[i]&data$choice==1&data$feedback==1& data$trial<data$trial[i]))])
}

score 0 · Accepted Answer

dplyr包包含几个用于此类累积操作的函数。在您的情况下，您将希望将它们与group_by()按刺激分组。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dat <- tibble::tribble(
  ~stim1, ~stim2, ~choice, ~outcome, ~feedback,
  2,     1,      0,       0,           1,
  3,     2,      1,       1,           1,
  2,     3,      1,       0,           1,
  2,     3,      0,       1,           1
)

dat |> 
  group_by(stim1) |> 
  mutate(
    count_stim1 = row_number(), 
    observed_stim1 = cumsum(outcome) / row_number()
  ) |> 
  group_by(stim2) |> 
  mutate(
    count_stim2 = row_number(), 
    observed_stim2 = cumsum(outcome) / row_number()
  ) |> 
  ungroup()
#> # A tibble: 4 x 9
#>   stim1 stim2 choice outcome feedback count_stim1 observed_stim1 count_stim2
#>   <dbl> <dbl>  <dbl>   <dbl>    <dbl>       <int>          <dbl>       <int>
#> 1     2     1      0       0        1           1          0               1
#> 2     3     2      1       1        1           1          1               1
#> 3     2     3      1       0        1           2          0               1
#> 4     2     3      0       1        1           3          0.333           2
#> # ... with 1 more variable: observed_stim2 <dbl>

^{由reprex 包于 2021-08-18 创建 (v2.0.0 )}

r - 基于条件的数据帧的 Dplyr 滚动更新

1 回答 1

Related

Reference