2

I’m using R version 2.15.3 (2013-03-01) with RStudio 0.97.312 on Ubuntu 12.10. I’m trying to create some histograms of logger data in R. However, some sensors weren’t always working, so I got some tables with #N/A and O/C in it. Here’s an excerpt of the log:

Date    Time    Type    control.value (V)   light.barrier (V)   T hotplate ('C) T mesh ('C) T exhaust ('C)  T camera ('C)   Ref. Junction 1 ('C)

30.03.2012  13:47:50    Interval    0.001   23.556  411.0   O/C 30.5    35.1    23.14
30.03.2012  13:47:51    Interval    0.001   23.556  411.1   O/C 30.3    35.2    23.14
30.03.2012  13:47:52    Interval    0.001   23.556  411.1   O/C 30.2    35.5    23.14
30.03.2012  13:47:53    Interval    0.001   23.556  410.9   O/C 29.8    35.5    23.14
30.03.2012  13:47:54    Interval    0.001   23.556  410.9   O/C 30.1    35.3    23.14
30.03.2012  13:47:55    Interval    0.001   23.556  411.1   O/C 30.2    35.4    23.14
30.03.2012  13:47:56    Interval    0.001   23.556  410.8   O/C 29.8    35.4    23.14
30.03.2012  13:47:57    Interval    0.001   23.556  410.2   O/C 29.4    35.3    23.14
30.03.2012  13:47:58    Interval    0.001   23.556  409.5   O/C 29.1    35.0    23.14
30.03.2012  13:47:59    Interval    0.000   23.556  408.9   O/C 29.3    34.6    23.14
30.03.2012  13:48:00    Interval    0.000   23.556  408.7   O/C #N/A    #N/A    23.14

Output of dput (head(logs), file = "dput.txt"): http://pastebin.de/34176

R refuses to process the columns with #N/A and O/C. I can’t reformat it by hand, the file has 185 000 lines.

When I load the log and try to create a histogram:

> logs <- read.delim("../data/logger/logs/logs.txt", header=TRUE) 
> hist (logs$mesh)

I get this error message:

Fehler in hist.default(logs$mesh) : 'x' muss nummerisch sein

Rough translation (see: How to change the locale of R in RStudio?):

Error in hist.default(logs$mesh) : 'x' must be numeric

The only columns I can create histograms from are the numerical ones listed by sapply. So I thought I have to remove these invalid values to get numeric ones.

How can I remove the invalid rows? I’m also open to other ways than processing them with R, e.g Perl or Python if that’s more suitable for this task.

This is the output of sapply after loading the log:

> sapply (logs, is.numeric)
     date          time          type control.value light.barrier      hotplate          mesh       exhaust 
    FALSE         FALSE         FALSE          TRUE         FALSE          TRUE         FALSE         FALSE 
   camera     reference 
    FALSE          TRUE 

After replacing the #N/A and O/C with NA (https://stackoverflow.com/a/16350443/2333821)

  logs.clean <- data.frame (check.rows = TRUE, apply(logs, 2, sub, pattern = "O/C|#N/A", replacement = NA))

I get this:

> sapply (logs.clean, is.numeric)
     date          time          type control.value light.barrier      hotplate          mesh       exhaust 
    FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE 
   camera     reference
    FALSE         FALSE 
4

4 回答 4

1

由于您特别询问了删除行,这就是我的做法,下面有一个替代方案。

#Makes some data
df <- data.frame(A = c("O/C", "#N/A", 1:3), B = c(4:6, "O/C", "#N/A"))
     # A    B
# 1  O/C    4
# 2 #N/A    5
# 3    1    6
# 4    2  O/C
# 5    3 #N/A

#Find rows that contain either value
remove <- apply(df, 1, function(row) any(row == "O/C" | row == "#N/A"))
#Subset using the negated index
df.rows <- df[!remove,]
#   A B
# 3 1 6

或者,您可以查找值并将它们设置为NA,这不会删除行,但允许大多数函数处理数据。

df.clean <- data.frame(apply(df, 2, sub, pattern = "O/C|#N/A", replacement = NA))

我用来data.frame()快速将所有内容转换为数字,可能有更优雅的方式来做到这一点......

于 2013-05-03T01:55:48.797 回答
1

既然您写道,除了用 R .... 处理它们之外,您还可以接受其他方式。

在常规终端窗口中(不在 R 控制台中):

grep -v  '#N/A' log.txt > cleaned.txt

该选项-v反转匹配,输出所有不匹配的行。

抓住所有没有#N/A和的行O/C

grep -v '#N/A\|O/C' log.txt > cleaned.txt
于 2013-05-03T04:37:40.657 回答
0

read.table有助于删除评论字符之后的任何内容。

注释字符由comment.char参数定义。

help(read.table)

当然,我只能猜测您正在使用read.table,因为您没有给我们示例代码或错误消息或什么也没有。

于 2013-05-03T07:18:21.210 回答
0

这是一篇旧帖子,但由于我在这里偶然发现,我会这样做来删除行:

df <- data.frame(A = c("O/C", "#N/A", 1:3), B = c(4:6, "O/C", "#N/A"))
#      A    B
# 1  O/C    4
# 2 #N/A    5
# 3    1    6
# 4    2  O/C
# 5    3 #N/A
cleandf <- df[!df$A %in% c("O/C", "#N/A") & !df$B %in% c("O/C", "#N/A"),]

一个使用条件选择数据框的班轮

于 2016-09-05T19:46:08.277 回答