这是我昨天发布的一个问题的后续。我似乎无法正确处理 R 中的浮点比较。昨天我>=
用来比较两个浮点值,但似乎得到了正确的结果。
今天,我尝试在两个向量上逐元素运行all.equal
,这产生了 a mean difference
,这不适用于此应用程序。我需要比较函数来返回一个向量。然后,我找到identical
并将其与mapply
. 这变得更加准确,但不是 100% 准确。我究竟做错了什么?由于这是财务数据,我应该使用十进制数据类型吗?如果是这样,怎么做?
从昨天的帖子(更新代码,反映当前的挫败感):
目标是:将数据读入data.frame
,取昨天的High
、Low
、 和Close
价格的平均值;并且,将今天的开盘价与昨天的平均价格进行比较。
在大数据上运行脚本后,我发现我的结果R
与Excel
. 我已经将问题缩小到它的基本部分。我的测试文件,test.csv
看起来像这样,在最后一行的末尾包括一个新行:
<TICKER>,<DATE>,<TIME>,<OPEN>,<LOW>,<HIGH>,<CLOSE>
EURUSD,20020311,0:00:00,0.8733,0.873,0.877,0.8749
EURUSD,20020312,0:00:00,0.8749,0.8704,0.876,0.8754
EURUSD,20020313,0:00:00,0.8753,0.8725,0.878,0.8754
EURUSD,20020314,0:00:00,0.8753,0.8752,0.8841,0.8823
EURUSD,20020315,0:00:00,0.8823,0.8808,0.8868,0.8823
EURUSD,20020318,0:00:00,0.8809,0.878,0.8828,0.8821
EURUSD,20020319,0:00:00,0.8821,0.8796,0.884,0.8816
EURUSD,20020320,0:00:00,0.8815,0.8786,0.8857,0.8855
EURUSD,20020321,0:00:00,0.8854,0.8806,0.8857,0.8823
我的代码:
# Read in test file
raw <- read.csv('test.csv', header=TRUE, sep=",")
# Convert date and dump dat into data frame
stripday <- strptime(raw$X.DATE, format="%Y%m%d")
data <- data.frame(stripday, raw)
# Drop unused data columns and name used columns
drops <- c("X.DATE.", "X.TIME.", "X.TICKER.")
data <- data[, !(names(data) %in% drops)]
colnames(data) <- c("Date", "Open", "Low", "High", "Close")
# Convert values from facotrs to numeric
data[,2] <- as.numeric(as.character(data[,2]))
data[,3] <- as.numeric(as.character(data[,3]))
data[,4] <- as.numeric(as.character(data[,4]))
data[,5] <- as.numeric(as.character(data[,5]))
# Take average of High, Low, and Close
data[['Avg']] <- NA
data[['Avg']][2:9] <- (
data[['High']][1:8] +
data[['Low']][1:8] +
data[['Close']][1:8]) / 3
# Is Open greater than or equal to Average
data[['OpenGreaterThanOrEqualAvg']] <- NA
data[['OpenGreaterThanOrEqualAvg']][2:9] <- 1 * (mapply(identical,data[['Open']][2:9], data[['Avg']][2:9]) | data[['Open']][2:9] > data[['Avg']][2:9])
# Write data to .csv
write.table(data, 'output.csv', quote=FALSE, sep=",", row.names=FALSE)
请注意,2002 年 3 月 14 日应该是 1,而不是 0。