我使用 R 计算 SP500 中所有股票的切线投资组合。
股权列表是通过 python 脚本加载的
import urllib2
import pytz
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from datetime import datetime
from pandas.io.data import DataReader
SITE = "http://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
START = datetime(1900, 1, 1, 0, 0, 0, 0, pytz.utc)
END = datetime.today().utcnow()
def scrape_list(site):
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site, headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
table = soup.find('table', {'class': 'wikitable sortable'})
sector_tickers = dict()
for row in table.findAll('tr'):
col = row.findAll('td')
if len(col) > 0:
sector = str(col[3].string.strip()).lower().replace(' ', '_')
ticker = str(col[0].string.strip())
if sector not in sector_tickers:
sector_tickers[sector] = list()
sector_tickers[sector].append(ticker)
return sector_tickers
# export sp500 in a list
def get_sp500_all():
sector_tickers = scrape_list(SITE)
a = sector_tickers.values()
b = a[0]
for i in range(1, len(a)):
b.extend(a[i])
return b
if __name__ == '__main__':
all_symbols = get_sp500_all()
all_symbols.sort()
print len(all_symbols)
np.savetxt('all_symbols.csv',all_symbols, delimiter=',',fmt="%s")
然后将包含所有股票的csv加载到R中,在R中计算相切投资组合
library(PerformanceAnalytics)
library(zoo)
library(tseries)
#load all the equities in SP500,
#file generate from python
# need transpose with t()
equity_list = t(read.csv("all_symbols.csv", header = FALSE, sep=","))
# start_date & end_date
start_date = "2012-10-30"
end_date = "2014-10-30"
# create a blank zoo type var
all_prices = zoo()
# download equity data from yahoo
for (i in 1:length(equity_list)){
a = get.hist.quote(instrument = equity_list[i], start = start_date, end = end_date,
quote = "AdjClose", provider = "yahoo", origin = "1970-01-01",
compression = "m", retclass = "zoo")
index(a) = as.yearmon(index(a))
if (i == 1){all_prices = a} else {all_prices = merge(all_prices, a)
all_prices}
}
colnames(all_prices) = equity_list
# Calculate cc returns as difference in log prices
returns_df = diff(log(all_prices))
# remove the column which begining with NA
for (i in equity_list){
if(is.na(returns_df[1, i])){returns_df = returns_df[, colnames(returns_df)!=i]}
}
# first forward fill the NA, then back foward fill the NA,
#if there has long time NA, should delete the symbol, fix it in future
returns_df = na.locf(returns_df)
returns_df = na.locf(returns_df, fromLast = TRUE)
####################################################################################
# Parameters CER model
mu_hat_month = apply(returns_df, 2, mean)
sigma2_month = apply(returns_df, 2, var)
sigma_month = apply(returns_df, 2, sd)
cov_mat_month = var(returns_df)
cor_mat_month = cor(returns_df)
####################################################################################
#tangency portfolio
rf = 0.00001
mu2 = mu_hat_month - fr
tangency_portfolio = solve(cov_mat_month, mu2)
但是总是有错误
Error in solve.default(cov(returns_df), mu2) :
system is computationally singular: reciprocal condition number = 1.59968e-21
我也可以使用函数 tangency.portfolio.r
##Tangency portfolio
# download tangency.portfolio.r
# https://r-forge.r-project.org/scm/viewvc.php/pkg/IntroCompFinR/R/tangency.portfolio.R?view=markup&root=introcompfinr
source("D:\\MOOC\\compfinance\\excise\\tangency.portfolio.r")
#The tangency portfolio
# risk free rate
t_bill_rate = 0.00001
# Tangency portfolio short sales allowed
tangency_portfolio_short = tangency.portfolio(mu_hat_month, cov_mat_month, risk.free=t_bill_rate, shorts=TRUE)
有错误
Error in chol.default(cov.mat) :
the leading minor of order 25 is not positive definite
似乎returns_df中的数据有问题,但不知道哪里错了。任何人都可以帮忙吗?