2

我正在处理天气数据和水电费账单,并正在尝试估计非线性回归模型。

我想出了一个问题。我调用的用于计算天气统计数据的函数,加热和冷却度天数,(HDD 和 CDD)不能应用于数据帧,nls 不能使用它。显然,我遗漏了一些关于函数参数的非常明显的东西。

有人可以指出我在下面的 HDD 和 CDD 功能哪里出了问题吗?

以下是一些生成虚假天气和计费数据的代码问题的简单示例。

# Generate Fake Weather Data
CZ<-c(1,2)
Date<-c('2001-01-01','2001-01-02','2001-01-03','2001-01-04')
Weather<-expand.grid(CZ,Date)
names(Weather)<-c("CZ","Date")
Weather$AvgTemp<-rnorm(8,mean= 60,sd=20)

#Generate Fake Billing Data
ID<-as.numeric(1:10)
CZ<-c(1,2)
StartDate<-'2001-01-01'
EndDate<-'2001-02-01'
FakeBilling<-data.frame(cbind(ID,CZ,StartDate,EndDate))
FakeBilling$KWH<-rnorm(10,mean=1000, sd=200)

#Heating and cooling degree functions
HDD<- function(b,CZ,StartDate,EndDate) {
    Temps<-Weather$AvgTemp[Weather$CZ==CZ&as.Date(Weather$Date) >=as.Date(StartDate) &     as.Date(Weather$Date) < as.Date(EndDate)];

    sum((b-Temps)/(1+exp(-5*(b-Temps))))
}


CDD <- function(b,CZ,StartDate,EndDate) {
Temps<- Weather$AvgTemp[as.character(Weather$CZ)==as.character(CZ) &     as.Date(Weather$Date) >=as.Date(StartDate)& as.Date(Weather$Date) < as.Date(EndDate)]

    sum((Temps-b)/(1+exp(-5*(Temps-b))))
}

#these work
HDD(60,1,'2001-01-01','2001-02-01')
# [1] 29.34333
CDD(60,1,'2001-01-01','2001-02-01')
# [1] 53.49393

# This does not. Lots of warnings about length
HDD(60,FakeBilling$CZ,FakeBilling$StartDate,FakeBilling$EndDate)
# [1] NA
# Warning messages:
#   1: In is.na(e1) | is.na(e2) :
#   longer object length is not a multiple of shorter object length
# 2: In `==.default`(Weather$CZ, CZ) :
#   longer object length is not a multiple of shorter object length
# 3: In `>=.default`(as.Date(Weather$Date), as.Date(StartDate)) :
#   longer object length is not a multiple of shorter object length
# 4: In `<.default`(as.Date(Weather$Date), as.Date(EndDate)) :
#   longer object length is not a multiple of shorter object length

# Would like to run this but get similar error.
nls(KWH~load + heatload*(HDD(base,CZ,StartDate,EndDate)) ,start=c(load=200,     heatload=.1,base=65), data=FakeBilling, na.action=na.omit)
# Error in numericDeriv(form[[3L]], names(ind), env) : 
#   Missing value or an infinity produced when evaluating the model
# In addition: Warning messages:
#   1: In is.na(e1) | is.na(e2) :
#   longer object length is not a multiple of shorter object length
# 2: In `==.default`(Weather$CZ, CZ) :
#   longer object length is not a multiple of shorter object length
# 3: In `>=.default`(as.Date(Weather$Date), as.Date(StartDate)) :
#   longer object length is not a multiple of shorter object length
# 4: In `<.default`(as.Date(Weather$Date), as.Date(EndDate)) :
#   longer object length is not a multiple of shorter object length
# 5: In is.na(e1) | is.na(e2) :
#   longer object length is not a multiple of shorter object length
# 6: In `==.default`(Weather$CZ, CZ) :
#   longer object length is not a multiple of shorter object length
# 7: In `>=.default`(as.Date(Weather$Date), as.Date(StartDate)) :
#   longer object length is not a multiple of shorter object length
# 8: In `<.default`(as.Date(Weather$Date), as.Date(EndDate)) :
#   longer object length is not a multiple of shorter object length
4

1 回答 1

1

您的函数未设置为矢量化函数。您可以使用mapply().

with(FakeBilling, mapply(HDD, b = 60, CZ = CZ, StartDate = StartDate, EndDate = EndDate))
#----
[1] 29.33481 13.39434 29.33481 13.39434 29.33481 13.39434 29.33481 13.39434 29.33481 13.39434

还有Vectorize()一个给出等效结果的函数:

HDDvec <- Vectorize(HDD)
HDDvec(60,FakeBilling$CZ,FakeBilling$StartDate,FakeBilling$EndDate)
#----
[1] 29.33481 13.39434 29.33481 13.39434 29.33481 13.39434 29.33481 13.39434 29.33481 13.39434
于 2012-11-05T04:16:58.820 回答