This is my first question to StackOverflow. I've searched high and low for an explanation but cannot seem to locate an answer. In short, the relational operator I am using ('<=') is not producing what is expected:
> data[['Open']][2]
[1] 79.22
> data[['Avg']][2]
[1] 79.22
> data[['Open']][2] >= data[['Avg']][2]
[1] FALSE
The goal is to: read the data into a data frame; take an average of yesterday's High, Low, and Close prices; and, compare today's Open price with yesterday's average.
After running the scripts on large data, I found that my results in R didn't match a similar analysis run in Excel.
For the StackOverflow community, I scaled down the problem to it's essential parts; but, since my error may be with how I am reading in the data, I've included that part of the code as well.
My test file ('test.csv') looks like this, including a new line at the end of the last row:
<TICKER>,<DATE>,<TIME>,<OPEN>,<LOW>,<HIGH>,<CLOSE>
USDJPY,20120713,0:00:00,79.26,79.05,79.37,79.24
USDJPY,20120716,0:00:00,79.22,78.67,79.23,78.84
My Code:
# Read in test file
raw <- read.csv('test.csv', header=TRUE, sep=",")
# Convert date and dump data into data frame, date is formatted for time series
stripday <- strptime(raw$X.DATE, format="%Y%m%d")
data <- data.frame(stripday, raw)
# Drop unused data columns and name the used columns
drops <- c("X.DATE.", "X.TIME.", "X.TICKER.")
data <- data[, !(names(data) %in% drops)]
colnames(data) <- c("Date", "Open", "Low", "High", "Close")
# Convert values from facotors to numeric
data[,2] <- as.numeric(as.character(data[,2]))
data[,3] <- as.numeric(as.character(data[,3]))
data[,4] <- as.numeric(as.character(data[,4]))
data[,5] <- as.numeric(as.character(data[,5]))
# Take yesterday's average of High, Low, and Close
data[['Avg']] <- NA
data[['Avg']][2] <- (
data[['High']][1] +
data[['Low']][1] +
data[['Close']][1]) / 3
# Is today's Open greater than or equal to yesterday's Average
data[['OpenGreaterThanAvg']] <- NA
data[['OpenGreaterThanAvg']] <- 1 * (data[['Open']] >= data[['Avg']])
# Write data to .csv
write.table(data, 'output.csv', quote=FALSE, sep=",", row.names=FALSE)
Since 79.22 equals 79.22, I would expect OpenGreaterThanAvg to display a "1" instead of a zero.
str() and class() tell me that the two objects, which I am trying to compare, are the same.
> str(data[['Open']][2])
num 79.2
> str(data[['Avg']][2])
num 79.2
> class(data[['Open']][2])
[1] "numeric"
> class(data[['Avg']][2])
[1] "numeric"
Also, note that R tells me that data[['Avg']][2] is less than data[['Open']][2]
> data[['Open']][2] < data[['Avg']][2]
[1] TRUE
Additionally, I am a HUGE fan of constructive criticism; so, if you suggestions unrelated to the question, I would welcome your comments.
Thank you. Brian