I’m using R version 2.15.3 (2013-03-01) with RStudio 0.97.312 on Ubuntu 12.10.
I’m trying to create some histograms of logger data in R. However, some sensors weren’t always working, so I got some tables with #N/A
and O/C
in it.
Here’s an excerpt of the log:
Date Time Type control.value (V) light.barrier (V) T hotplate ('C) T mesh ('C) T exhaust ('C) T camera ('C) Ref. Junction 1 ('C)
30.03.2012 13:47:50 Interval 0.001 23.556 411.0 O/C 30.5 35.1 23.14
30.03.2012 13:47:51 Interval 0.001 23.556 411.1 O/C 30.3 35.2 23.14
30.03.2012 13:47:52 Interval 0.001 23.556 411.1 O/C 30.2 35.5 23.14
30.03.2012 13:47:53 Interval 0.001 23.556 410.9 O/C 29.8 35.5 23.14
30.03.2012 13:47:54 Interval 0.001 23.556 410.9 O/C 30.1 35.3 23.14
30.03.2012 13:47:55 Interval 0.001 23.556 411.1 O/C 30.2 35.4 23.14
30.03.2012 13:47:56 Interval 0.001 23.556 410.8 O/C 29.8 35.4 23.14
30.03.2012 13:47:57 Interval 0.001 23.556 410.2 O/C 29.4 35.3 23.14
30.03.2012 13:47:58 Interval 0.001 23.556 409.5 O/C 29.1 35.0 23.14
30.03.2012 13:47:59 Interval 0.000 23.556 408.9 O/C 29.3 34.6 23.14
30.03.2012 13:48:00 Interval 0.000 23.556 408.7 O/C #N/A #N/A 23.14
Output of dput (head(logs), file = "dput.txt")
: http://pastebin.de/34176
R refuses to process the columns with #N/A
and O/C
. I can’t reformat it by hand, the file has 185 000 lines.
When I load the log and try to create a histogram:
> logs <- read.delim("../data/logger/logs/logs.txt", header=TRUE)
> hist (logs$mesh)
I get this error message:
Fehler in hist.default(logs$mesh) : 'x' muss nummerisch sein
Rough translation (see: How to change the locale of R in RStudio?):
Error in hist.default(logs$mesh) : 'x' must be numeric
The only columns I can create histograms from are the numerical ones listed by sapply. So I thought I have to remove these invalid values to get numeric ones.
How can I remove the invalid rows? I’m also open to other ways than processing them with R, e.g Perl or Python if that’s more suitable for this task.
This is the output of sapply after loading the log:
> sapply (logs, is.numeric)
date time type control.value light.barrier hotplate mesh exhaust
FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
camera reference
FALSE TRUE
After replacing the #N/A
and O/C
with NA
(https://stackoverflow.com/a/16350443/2333821)
logs.clean <- data.frame (check.rows = TRUE, apply(logs, 2, sub, pattern = "O/C|#N/A", replacement = NA))
I get this:
> sapply (logs.clean, is.numeric)
date time type control.value light.barrier hotplate mesh exhaust
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
camera reference
FALSE FALSE