[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-sig-teaching
Subject:    [R-sig-teaching] demonstration of weaknesses in stepwise variable selection
From:       John Maindonald <john.maindonald () anu ! edu ! au>
Date:       2018-10-03 20:18:30
Message-ID: 9B209EF6-5B42-4E16-B73F-38A6EEBA3491 () anu ! edu ! au
[Download RAW message or body]

The functions bsnVaryNvar() and bestsetNoise() in the DAAG package
have been designed to highlight the opportunities that conventional
variable selection methods offer for generating "significant" effects from
data that is pure noise.  There is an accompanying vignette
simulate-varselect.

It is a sad commentary on the limited extent to which an informed and
incisive critical appraisal is commonly applied to results tossed out by
packaged software that standard forms of backward and forward
regression variable selection, and best subsets regression, continue
to be used as the basis for published work.


John Maindonald             email: \
john.maindonald@anu.edu.au<mailto:john.maindonald@anu.edu.au>

On 3/10/2018, at 23:00, \
r-sig-teaching-request@r-project.org<mailto:r-sig-teaching-request@r-project.org> \
wrote:

Message: 1
Date: Tue, 02 Oct 2018 10:54:36 -0500
From: "R. Mark Sharp" <rmsharp@me.com<mailto:rmsharp@me.com>>
To: r-sig-teaching@r-project.org<mailto:r-sig-teaching@r-project.org>
Subject: [R-sig-teaching] demonstration of weaknesses in stepwise
variable selection
Message-ID: <485396B7-761E-499D-B580-BF7D3E912354@me.com<mailto:485396B7-761E-499D-B580-BF7D3E912354@me.com>>
                
Content-Type: text/plain; charset="us-ascii"

I am developing a short presentation for people with applied statistical backgrounds \
who have used backward stepwise variable selection where they remove variables based \
on small coefficient values, coefficient P values > 0.05, and large variances.

I am wanting to provide some demonstration code in R that highlights some of the \
weakness as described by Frank Harrell (citation below).

Of particular interest are (1) failure to include informative predictor variables \
(categorical and continuous) and (2) lowered standard errors for the coefficients in \
the final model. I have code to demonstrate inclusion of too many false predictors.

I expect this code is available, but I have not found it. Guidance would be \
appreciated.

Mark
P.S. I have started a public github package at https://github.com/rmsharp/stepwiser
I has very little in it thus far.


Frank E. Harrell. Regression Modeling Strategies with applications to linear models, \
logistic regression, and survival analysis, Springer Series in Statistics. \
Springer-Verlag. 2015.


R. Mark Sharp, Ph.D.
Data Scientist and Biomedical Statistical Consultant
7526 Meadow Green St.
San Antonio, TX 78251
mobile: 210-218-2868
rmsharp@me.com<mailto:rmsharp@me.com>



	[[alternative HTML version deleted]]

_______________________________________________
R-sig-teaching@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic