'Re: [R] Multiple regressions with changing dependent variable and time span'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-help
Subject:    Re: [R] Multiple regressions with changing dependent variable and time span
From:       arun <smartpink111 () yahoo ! com>
Date:       2013-11-30 22:28:18
Message-ID: 1385850498.46297.YahooMailNeo () web142604 ! mail ! bf1 ! yahoo ! com
[Download RAW message or body]

Hi,
No problem.

In that case, each column will be a list.  For example if I take the first element of \
`lst2` dW1 <- rollapply(lst2[[1]],width=32,FUN=function(z) {z1 <- as.data.frame(z); \
if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); \
durbinWatsonTest(l1,max.lag=3) } else rep(NA,4)},by.column=FALSE,align="right")

 tail(dW1[,1],1)
#[[1]]
#[1] -0.3602936  0.1975667 -0.1740797

You can store it by:
resdW1 <- do.call(cbind,lapply(seq_len(ncol(dW1)),function(i) \
do.call(rbind,dW1[,i]))[1:3])

Similarly, for more than one elements (using a subset of lst2- as it takes time)

lst3 <- lapply(lst2[1:2],function(x) rollapply(x,width=32,FUN=function(z) {z1 <- \
as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); \
durbinWatsonTest(l1,max.lag=3) } else rep(NA,4)},by.column=FALSE,align="right"))

lst3New <- lapply(lst3,function(x) do.call(cbind,lapply(seq_len(ncol(x)),function(i) \
do.call(rbind,x[,i]))[1:3]))

lst3New <- lapply(lst3New, function(x) {colnames(x) <- \
paste0(rep(c("r","dw","p"),each=3),1:3); x})

A.K.

On Saturday, November 30, 2013 5:03 PM, nooldor <nooldor@gmail.com> wrote:

Hey!

Yes,
only the D-W test takes so much time, did not check it yet

I checked results (estimates) with manually run regressions (in excel) and they are \
correct.

I only change the "width" to 31 and "each=123" to 124, cause it should be \
((154-31)+1) x 334 = 41416 matrix

with the lag in D-W test I was wondering how to have table when I use \
durbinWatsonTest(l1,3) - with three lags instead of default 1.

but I can manage it - just need to learn about functions used by you.

Any way: BIG THANK to you!

Best wishes,
T.S.

On 30 November 2013 21:12, arun <smartpink111@yahoo.com> wrote:

Hi,
> 
> I was able to read the file after saving it as .csv.  It seems to work without any \
> errors. 
> dat1<-read.csv("Book2.csv", header=T)
> ###same as previous
> 
> 
> lst1 <- lapply(paste("r",1:334,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x]))
> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
> library(zoo)
> 
> res1 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) \
> {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 \
> <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), pval=summary(l1)$coef[,4], \
> rsquare=summary(l1)$r.squared) } else rep(NA,9)},by.column=FALSE,align="right"))) \
> row.names(res1) <- rep(paste("r",1:334,sep="."),each=123) dim(res1)
> #[1] 41082     9
> 
> #vif
> library(car)
> 
> res2 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) \
> {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 \
> <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } else \
> rep(NA,3)},by.column=FALSE,align="right"))) row.names(res2) <- \
> rep(paste("r",1:334,sep="."),each=123) dim(res2)
> #[1] 41082     3
> 
> #DW statistic:
> lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- \
> as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); \
> durbinWatsonTest(l1) } else rep(NA,4)},by.column=FALSE,align="right")) res3 <- \
> do.call(rbind,lapply(lst3,function(x) x[,-4])) row.names(res3) <- \
> rep(paste("r",1:334,sep="."),each=123) dim(res3)
> #[1] 41082     3
> ##ncvTest()
> f4 <- function(meanmod, dta, varmod) {
> assign(".dta", dta, envir=.GlobalEnv)
> assign(".meanmod", meanmod, envir=.GlobalEnv)
> m1 <- lm(.meanmod, .dta)
> ans <- ncvTest(m1, varmod)
> remove(".dta", envir=.GlobalEnv)
> remove(".meanmod", envir=.GlobalEnv)
> ans
> }
> 
> lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- \
> as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else \
> NA},by.column=FALSE,align="right")) names(lst4) <- paste("r",1:334,sep=".")
> length(lst4)
> #[1] 334
> 
> 
> ###jarque.bera.test
> library(tseries)
> res5 <- do.call(rbind,lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) \
> {z1 <- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 \
> <-lm(r~F.1+F.2+F.3,data=z1); resid <- residuals(l1); \
> unlist(jarque.bera.test(resid)[1:3]) } else \
> rep(NA,3)},by.column=FALSE,align="right"))) dim(res5)
> #[1] 41082     3
> 
> A.K.
> 
> 
> 
> 
> 
> 
> 
> 
> On Saturday, November 30, 2013 1:44 PM, nooldor <nooldor@gmail.com> wrote:
> 
> here is in .xlsx should be easy to open and eventually find&replace commas \
> according to you excel settings (or maybe it will do it automatically) 
> 
> 
> 
> 
> 
> On 30 November 2013 19:15, arun <smartpink111@yahoo.com> wrote:
> 
> I tried that, but:
> > 
> > 
> > 
> > dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
> > > str(dat1)
> > 'data.frame':    154 obs. of  1 variable:
> > 
> > Then I changed to:
> > dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
> > > str(dat1)
> > 'data.frame':    154 obs. of  661 variables:
> > Both of them are wrong as the number of variables should be 337.
> > A.K.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Saturday, November 30, 2013 12:53 PM, nooldor <nooldor@gmail.com> wrote:
> > 
> > Thank you,
> > 
> > I got your reply. I am just testing your script. I will let you know how is it \
> > soon. 
> > .csv could be problematic as commas are used as dec separator (Eastern Europe \
> > excel settings) ... I read it in R with this: dat1<-read.table("Book2.csv", \
> > head=T, sep=";", dec=",") 
> > Thank you very much !!!
> > 
> > T.S.
> > 
> > 
> > 
> > 
> > On 30 November 2013 18:39, arun <smartpink111@yahoo.com> wrote:
> > 
> > I couldn't read the "Book.csv" as the format is completely messed up.  Anyway, I \
> > hope the solution works on your dataset.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On Saturday, November 30, 2013 10:34 AM, nooldor <nooldor@gmail.com> wrote:
> > > 
> > > 
> > > ok.
> > > 
> > > 
> > > > dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") > colnames(dat1) <- \
> > > > c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1 <- \
> > > > lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x])) > \
> > > > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) > \
> > > > sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 > sapply(lst2,function(x) \
> > > > sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
> > > in att you have the data file
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On 30 November 2013 16:22, arun <smartpink111@yahoo.com> wrote:
> > > 
> > > Hi,
> > > > The first point is not that clear.
> > > > 
> > > > Could you show the expected results in this case?
> > > > 
> > > > set.seed(432)
> > > > dat1 <- as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
> > > > colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
> > > > lst1 <- lapply(paste("r",1:2,sep="."),function(x) \
> > > > cbind(dat1[,c(1:3)],dat1[x])) 
> > > > 
> > > > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> > > > sum(!!rowSums(is.na(lst2[[1]])))
> > > > #[1] 40
> > > > sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
> > > > #[1] 40 46
> > > > 
> > > > 
> > > > A.K.
> > > > 
> > > > 
> > > > 
> > > > On Saturday, November 30, 2013 10:09 AM, nooldor <nooldor@gmail.com> wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > Thanks for reply!
> > > > 
> > > > 
> > > > Three things:
> > > > 1.
> > > > I did not write that some of the data has more then 31 NA in the column and \
> > > > then it is not possible to run lm() 
> > > > Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0 \
> > > > (non-NA) casesIn this case program should return "NA" symbol and go further, \
> > > > in the case when length of the observations is shorter then 31 program should \
> > > > always return "NA" but go further . 
> > > > 
> > > > 
> > > > 2. in your result matrix there are only 4 columns (for estimates of the \
> > > > coefficients), is it possible to put there 4 more columns with p-values and \
> > > > one column with R squared 
> > > > 
> > > > 3. basic statistical test for the regressions:
> > > > 
> > > > inflation factors can be captured by:
> > > > res2 <- do.call(rbind,lapply(lst2,function(x) \
> > > > rollapply(x,width=32,FUN=function(z) vif(lm(r~ \
> > > > F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right"))) 
> > > > and DW statistic:
> > > > res3 <- do.call(rbind,lapply(lst2,function(x) \
> > > > rollapply(x,width=32,FUN=function(z) durbinWatsonTest(lm(r~ \
> > > > F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right"))) 
> > > > 
> > > > 3a)is that right?
> > > > 
> > > > 3b) how to do and have in user-friendly form durbinWatsonTest for more then 1 \
> > > > lag? 
> > > > 3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from \
> > > > library(car) ??? 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Pozdrowienia,
> > > > 
> > > > Tomasz Schabek
> > > > 
> > > > 
> > > > On 30 November 2013 07:42, arun <smartpink111@yahoo.com> wrote:
> > > > 
> > > > Hi,
> > > > > The link seems to be not working.  From the description, it looks like:
> > > > > set.seed(432)
> > > > > dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
> > > > > colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
> > > > > lst1 <- lapply(paste("r",1:334,sep="."),function(x) \
> > > > > cbind(dat1[,c(1:3)],dat1[x])) 
> > > > > lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> > > > > library(zoo)
> > > > > 
> > > > > res <- do.call(rbind,lapply(lst2,function(x) \
> > > > > rollapply(x,width=32,FUN=function(z) coef(lm(r~ \
> > > > > F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right"))) 
> > > > > row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
> > > > > dim(res)
> > > > > #[1] 41082     4
> > > > > 
> > > > > coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
> > > > > #(Intercept)         F.1         F.2         F.3
> > > > > #109.9168150  -0.1705361  -0.1028231   0.2027911
> > > > > coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
> > > > > #(Intercept)         F.1         F.2         F.3
> > > > > #119.3718949  -0.1660709  -0.2059830   0.1338608
> > > > > res[1:2,]
> > > > > #    (Intercept)        F.1        F.2       F.3
> > > > > #r.1    109.9168 -0.1705361 -0.1028231 0.2027911
> > > > > #r.1    119.3719 -0.1660709 -0.2059830 0.1338608
> > > > > 
> > > > > A.K.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > On Friday, November 29, 2013 6:43 PM, nooldor <nooldor@gmail.com> wrote:
> > > > > Hi all!
> > > > > 
> > > > > 
> > > > > I am just starting my adventure with R, so excuse me naive questions.
> > > > > 
> > > > > My data look like that:
> > > > > 
> > > > > <http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
> > > > > 
> > > > > I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
> > > > > (r.1, r.2, ... r.334) - each one of these will be dependent variable in my
> > > > > regression.
> > > > > 
> > > > > Total span of the time is 154 observations. But I would like to have \
> > > > > rolling window regression with length of 31 observations.
> > > > > 
> > > > > I would like to run script like that:
> > > > > 
> > > > > summary(lm(r.1~F.1+F.2+F.3, data=data))
> > > > > vif(lm(r.1~F.1+F.2+F.3, data=data))
> > > > > 
> > > > > But for each of 334 (r.1 to r.334) dependent variables separately and with
> > > > > rolling-window of the length 31obs.
> > > > > 
> > > > > Id est:
> > > > > summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
> > > > > 31. for the first regression) times for rolling-fixed period of 31 obs.
> > > > > 
> > > > > The next regression would be:
> > > > > summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
> > > > > summary(lm(r.334~F.1+F.2+F.3, data=data))
> > > > > 
> > > > > It means it would be 123 x 334 regressions (=41082 regressions)
> > > > > 
> > > > > I would like to save results (summary + vif test) of all those 41082
> > > > > regressions in one read-user-friendly file like this given by e.g command
> > > > > capture.output()
> > > > > 
> > > > > Could you help with it?
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > T.S.
> > > > > 
> > > > > [[alternative HTML version deleted]]
> > > > > 
> > > > > ______________________________________________
> > > > > R-help@r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide \
> > > > > http://www.R-project.org/posting-guide.html and provide commented, minimal, \
> > > > > self-contained, reproducible code. 
> > > > > 
> > > > 
> > > 
> > 
> 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[prev in list] [next in list] [prev in thread] [next in thread]