5

I'm usin R language and working with time series daily stock index from differents countries. In order to make comparisons between of differents indexes,(like correletaion, causality etc..) I need that all the series have the same number of lines, but because diferents holidays in diferents countries, the number of lines in each series change.

I'm working with extracted files from yahoo finance, with format .csv, like...

> head(sp)
>           Date    Open    High     Low   Close     Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14

I need... for example, suppose that day 2010-01-07 is a holiday, in this case, the next line (line 1285) in the file is the day 2010-01-08:

> head(sp)
>           Date    Open    High     Low   Close     Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
>1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

In need fill the gap in 2010-01-07 with the previus day data, like :

> head(sp)
>           Date    Open    High     Low   Close     Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
>1285 2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
>1284 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

How I can do this ???

My code is (look all the library that I tried using for solve my problem kkk)

>library(PerformanceAnalytics)
>library(tseries)
>library(urca)
>library(zoo)
>library(lmtest)
>library(timeDate)
>library(timeSeries)

>setwd("C:/Users/Fatima/Documents/R")

>sp = read.csv("SP500.csv", header = TRUE, stringsAsFactors = FALSE)
>sp$Date = as.Date(sp$Date)
>sp = sp[order(sp$Date), ]

Sorry about my bad english

4

2 回答 2

3

包 xts 在这里很有用:

DF <- read.table(text = "           Date    Open    High     Low   Close     Volume Adj.Close
1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98", header = TRUE)

DF$Date <- as.Date(DF$Date)

library(xts)
X <- as.xts(DF[,-1], order.by = DF$Date)
na.locf(merge(X, seq(min(DF$Date), max(DF$Date), by = 1)))
#              Open    High     Low   Close     Volume Adj.Close
#2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
#2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
#2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
#2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
#2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

编辑:

回应您的评论:您可以像这样排除周末:

dates <- seq(min(DF$Date), max(DF$Date), by = 1)
#you might have to adjust the following to the translations in your locale
dates <- dates[!(weekdays(dates) %in% c("Saturday", "Sunday"))]
na.locf(merge(X, dates))
于 2015-03-19T13:24:25.227 回答
2

在 using 中阅读它,read.zoo通过将零宽度动物园系列与所有日期合并来添加缺失的日期。最后用于na.locf填写NA合并生成的值。

Lines <- "Date,Open,High,Low,Close,Volume,Adj.Close
2010-01-04,1116.56,1133.87,1116.56,1132.99,3991400000,1132.99
2010-01-05,1132.66,1136.63,1129.66,1136.52,2491020000,1136.52
2010-01-06,1135.71,1139.19,1133.95,1137.14,4972660000,1137.14
2010-01-11,1140.52,1145.39,1136.22,1144.98,4389590000,1144.98"

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, sep = ",")
zout <- na.locf( merge(z, zoo(, seq(start(z), end(z), by = "day"))) )

给予:

> zout
              Open    High     Low   Close     Volume Adj.Close
2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-08 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-09 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-10 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-11 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

该行的替代方法na.locf是使用na.approxwithmethod = "constant"代替:

na.approx(z, xout = seq(start(z), end(z), by = "day"), method = "constant")

给出相同的答案。

添加NA周末:

library(chron)
zout[is.weekend(time(zout)), ] <- NA

或仅返回工作日:

library(chron)
zout[!is.weekend(time(zout))]
于 2015-03-19T13:48:48.390 回答