r - automatic filtering of measurement data within R

Question

my question is about automatic filtering of measurement data, because I have several hundred files to process. The file-structure looks like:

test1 <- read.table("~/test1.txt",sep="\t",dec=".",skip=17,header=TRUE)

Number  Time.s      Potential.V Current.A
1       0.0000      0.060       -0.7653
2       0.0285      0.060       -0.7597
3       0.0855      0.060       -0.7549
.....
17      0.8835      0.060       -0.7045
18      0.9405      0.060       -0.5983
19      0.9975      0.061       -0.1370
20      1.0545      0.062        0.1295
21      1.1115      0.063        0.2680
......
8013    456.6555    0.066       -1.1070
8014    456.7125    0.065       -1.1850
8015    456.7695    0.063       -1.2610
8016    456.8265    0.062       -1.3460
8017    456.8835    0.061       -1.4380
8018    456.9405    0.060       -1.4350
8019    456.9975    0.060       -1.0720
8020    457.0545    0.060       -0.8823
8021    457.1115    0.060       -0.7917
8022    457.1685    0.060       -0.7481

I need to get rid off the beginning and ending extra lines with the Potential.V == 0.06. My problem is that the number of lines in the beginning and at the end of the various files isn't fix.

Next restriction is that the file includes several measurements after each other, so I can't just remove all lines with 0.06 in the data.frame.

I the moment I do the cutting manually, not very elegant but I don't know of a better solution:

test_b1 <- data.frame(test1$Number[18:8018],test1$Time.s[18:8018],test1$Potential.V[18:8018],test1$Current.A[18:8018])

I tried using iterations like

for (c in 1:(length(test1))) {
    if (counter>1) & ((as.numeric(r[counter])- as.numeric(r[counter-1]))==1) {
       cat("Skip \n")}
}

but I didn't got a working solution, because of a lack of skill on my side :/ .

Is there a module on CRAN or a more elegant way to solve such problems ?

Best regards

score 2 · Accepted Answer

Another way using which.max:

# data modified to include 0.06 Potential.V in inner range
d <- read.table(text="Number  Time.s      Potential.V Current.A
1       0.0000      0.060       -0.7653
2       0.0285      0.060       -0.7597
3       0.0855      0.060       -0.7549
17      0.8835      0.060       -0.7045
18      0.9405      0.060       -0.5983
19      0.9975      0.061       -0.1370
19      0.9975      0.060       -0.1370
20      1.0545      0.062        0.1295
21      1.1115      0.063        0.2680
8013    456.6555    0.066       -1.1070
8014    456.7125    0.065       -1.1850
8015    456.7695    0.063       -1.2610
8016    456.8265    0.062       -1.3460
8017    456.8835    0.061       -1.4380
8018    456.9405    0.060       -1.4350
8019    456.9975    0.060       -1.0720
8020    457.0545    0.060       -0.8823
8021    457.1115    0.060       -0.7917
8022    457.1685    0.060       -0.7481", header=TRUE)

with(d, {
    inner.start <- which.max(Potential.V != 0.06)
    inner.end <- nrow(d) - which.max(rev(Potential.V != .06)) + 1
    d[inner.start:inner.end, ]
})

#    Number   Time.s Potential.V Current.A
# 6      19   0.9975       0.061   -0.1370
# 7      19   0.9975       0.060   -0.1370
# 8      20   1.0545       0.062    0.1295
# 9      21   1.1115       0.063    0.2680
# 10   8013 456.6555       0.066   -1.1070
# 11   8014 456.7125       0.065   -1.1850
# 12   8015 456.7695       0.063   -1.2610
# 13   8016 456.8265       0.062   -1.3460
# 14   8017 456.8835       0.061   -1.4380

If you want to include the 0.06 row just before and after the inner range, subtract 1 from inner.start and add 1 to inner.end.

score 2 · Accepted Answer

这是一个使用rle：

filter.df <- function(df) {
    pot.rle <- rle(df$Potential.V)
    idx <- cumsum(pot.rle$lengths)
    val <- pot.rle$values
    chk <- ifelse(val[1] == 0.06 & val[length(val)] == 0.06, TRUE, FALSE)
    if (chk) {
        df[(idx[1]):(max(idx[1], idx[length(idx)-1])+1), ]
    }
}
filter.df(df)

#    Number   Time.s Potential.V Current.A
# 5      18   0.9405       0.060   -0.5983
# 6      19   0.9975       0.061   -0.1370
# 7      20   1.0545       0.062    0.1295
# 8      21   1.1115       0.063    0.2680
# 9    8013 456.6555       0.066   -1.1070
# 10   8014 456.7125       0.065   -1.1850
# 11   8015 456.7695       0.063   -1.2610
# 12   8016 456.8265       0.062   -1.3460
# 13   8017 456.8835       0.061   -1.4380
# 14   8018 456.9405       0.060   -1.4350

score 2 · Accepted Answer

这是另一个，非常相似，也有rle：

val <- rle(df$Potential.V)
if (val$values[1]==0.06) df <- df[-(1:(val$lengths[1]-1)),]
if (tail(val$values,1)==0.06) {
    nb <- nrow(df)
    df <- df[-((nb-tail(val$lengths,1)+2):nb),]
}

它给：

   Number   Time.s Potential.V Current.A
5      18   0.9405       0.060   -0.5983
6      19   0.9975       0.061   -0.1370
7      20   1.0545       0.062    0.1295
8      21   1.1115       0.063    0.2680
9    8013 456.6555       0.066   -1.1070
10   8014 456.7125       0.065   -1.1850
11   8015 456.7695       0.063   -1.2610
12   8016 456.8265       0.062   -1.3460
13   8017 456.8835       0.061   -1.4380
14   8018 456.9405       0.060   -1.4350

r - automatic filtering of measurement data within R

3 回答 3

Related

Reference