2

(This is basically the dataframe from the picture)

structure(list(mz = c(40, 50, 60, 70, 80, 90), 
`sample in1` = c(10, 51, 125, 99, 675, 12), 
`sample in2` = c(9, 51, 125, 105, 2424, 5),
`Sample in3` = c(1, 51, 125, 300, 1241, 0.02), 
`blank 1` = c(5, 20, 50, 68, 0, 0),
`blank 2` = c(10, 20, 50, 77, 0, 0),
`blank 3` = c(15, 20, 50, 89, 0, 0.01)), 
row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

I've got a large dataframe of over 50 columns and 50.000 rows, but this is a simplified version of what I have.

enter image description here

What I basically need to do, is to summarize the "blank" columns into 1 column. I managed to do this already, using RowMeans on all blank columns. Next, I want to apply a function to each sample value in my dataframe, the function is: sample value/mean of all blank columns for that specific row = Ratio SB. The tricky part is: I do not want the result of this function to replace the column values of the sample. What I want to do, is to either (1) Leave the column value as-is, IF Ratio SB is bigger than a set limit (for example: Ratio SB>2.5, should leave the column value as is). OR (2) return 0 (or NA), in case the result of the SB ratio is smaller than a set limit (for example: if Ratio SB <2.5, return 0 for that specific column value). Finally (3), the function should NOT be run on samples (or leave sample values as-is) in case the Averaged blank value of that row in the dataframe = 0 (as is the case in the second last row in picture 1)

Those are the 3 essential things for my code. The output should still be a dataframe and when applying it to the dataframe shown in picture 1, the following changes in the dataframe should occur when running the function I want to make. enter image description here

This is what it should look like after applying the code:

structure(list(mz = c(40, 50, 60, 70, 80, 90), 
`sample in1` = c(NA, 51, NA, NA, 675, 12), 
`sample in2` = c(NA, 51, NA, NA, 2424, 5),
`Sample in3` = c(NA, 51, NA, 300, 1241, NA), 
`Blank Average` = c(10, 20, 50, 78, NA, 0.00333333),
row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Up until now I tried the use of case_when, but for this I need to revert to a list structure and can't apply it apparently on a dataframe. I also tried: Ratio_X <- sample/averaged blank dataframe[Ratio_x< 2] <- 0 and tried using apply and map_dfc, but none of it worked out. I'm probably doing something terribly wrong and would love it if anyone has any tips or perhaps a solution for me :)

I apologize if anything is unclear, I am a COMPLETE beginner in R and stack overflow. Please let me know if you require additional information. Thanks in advance!

4

1 回答 1

2

Here is a base R approach :

#Get all the columns which has 'blank' in it
blank_cols <- grep('blank', names(df))
#Get all the columns which has 'sample' in it. 
#Using `ignore.case` because you have "Sample" in 3rd column
sample_cols <- grep('sample', names(df), ignore.case = TRUE)
#Threshold limit to check for
thresh <- 2.5

#Get mean of blank_cols
df$Blank_avg <- rowMeans(df[, blank_cols], na.rm = TRUE)
#Compare sample_cols with the mean, replace them by NA if they are below thresh
df[sample_cols][sweep(df[sample_cols], 1, df1$Blank_avg, `/`) <= thresh] <- NA
#Turn Blank_avg to NA where Blank_avg = 0
df$Blank_avg[df$Blank_avg == 0] <- NA
#Remove blank_cols
result <- df[, -blank_cols]
result
# A tibble: 6 x 5
#     mz `sample in1` `sample in2` `Sample in3` Blank_avg
#  <dbl>        <dbl>        <dbl>        <dbl>     <dbl>
#1    40           NA           NA        NA     10      
#2    50           51           51        51     20      
#3    60           NA           NA        NA     50      
#4    70           NA           NA       300     78      
#5    80          675         2424      1241     NA      
#6    90           12            5       0.02   0.00333
于 2020-09-24T04:54:43.320 回答