I am trying to perform a co-integration test on two stocks, using data from Yahoo Finance. From what I have been reading there are less complicated ways to retrieve Yahoo data. I need to retrieve two securities and define them as stk1
and stk2
as well as be able to adjust the time frame of the data retrieved. Here is what I have so far.
library(zoo)
library(tseries)
# Read the CSV files into data frames
stk1 <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=CAT&a=8&b=1&c=2009&d=12&e=31&f=2010&g=d&ignore=.csv", stringsAsFactors=F)
stk2 <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=DD&a=8&b=1&c=2009&d=12&e=31&f=2010&g=d&ignore=.csv", stringsAsFactors=F)
# The first column contains dates. as.Date converts strings into Date objects
stk1_dates <- as.Date(stk1[,1])
stk2_dates <- as.Date(stk2[,1])
# The seventh column contains the adjusted close. We use the zoo function to
# create zoo objects from that data. The function takes two arguments: a
# vector of data and a vector of dates.
stk1 <- zoo(stk1[,7], stk1_dates)
stk2 <- zoo(stk2[,7], stk2_dates)
# The merge function combines two (or more) zoo objects,
# computing either their intersection (all=FALSE) or union (all=TRUE).
t.zoo <- merge(stk1, stk2, all=FALSE)
# At this point, t.zoo is a zoo object with two columns: stk1 and stk2.
# Most statistical functions expect a data frame for input, so we convert.
t <- as.data.frame(t.zoo)
# Tell the user what dates are spanned by the data.
cat("Date range is", format(start(t.zoo)), "to", format(end(t.zoo)), "\n")
m <- lm(stk1 ~ stk2 + 0, data=t)
beta <- coef(m)[1]
cat("Assumed hedge ratio is", beta, "\n")
sprd <- t$stk1 - beta*t$stk2
ht <- adf.test(sprd, alternative="stationary", k=0)
cat("ADF p-value is", ht$p.value, "\n")
if (ht$p.value < 0.05) {
cat("The spread is likely mean-reverting\n")
} else {
cat("The spread is not mean-reverting.\n")
}
What tools exist to make this easier and/or more robust?