'Re: [R-sig-Geo] [DKIM] Re: Fw: Why is there a large predictive difference for Univ. Kriging? [SEC=UN'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-sig-geo
Subject:    Re: [R-sig-Geo] [DKIM] Re: Fw: Why is there a large predictive difference for Univ. Kriging? [SEC=UN
From:       "Joelle k. Akram" <chino_tones () hotmail ! com>
Date:       2017-11-23 1:04:27
Message-ID: YTXPR0101MB1149A205A01299CC4CBA345190210 () YTXPR0101MB1149 ! CANPRD01 ! PROD ! OUTLOOK ! COM
[Download RAW message or body]

Lin,

Your review paper on SIM is very comprehensive and a good read. You touched on \
Irregular versus Regular System (section 3.1.7  in the paper). I do not have access \
to the Burrough and McDonnell book but from your experience; are there any tips you \
can suggest when handling irregular spatial points for kriging and in particular KED \
(such as in terms of modeling semivariograms, etc) ?

thanks,

Chris

________________________________
From: Li Jin <Jin.Li@ga.gov.au>
Sent: November 22, 2017 4:30 PM
To: Joelle k. Akram; Tomislav Hengl; r-sig-geo@r-project.org
Subject: RE: [DKIM] Re: [R-sig-Geo] Fw: Why is there a large predictive difference \
for Univ. Kriging? [SEC=UNCLASSIFIED]

That is something I am still working on.

Variances of the predictions by spatial predictive models are not well defined yet, \
and they are many ways to do this but none of them is satisfactory. Although people \
may argue that kriging methods can produce such variance, kriging variance in fact \
does not reflect the uncertainty expected. Please see Goovaerts 1997 ‘Geostatistics \
for Natural Resources Evaluation’ for details, or simply see my review ‘A Review of \
Spatial Interpolation Methods for Environmental Scientists’ for a short explanation.

From: Joelle k. Akram [mailto:chino_tones@hotmail.com]
Sent: Thursday, 23 November 2017 10:01 AM
To: Li Jin; Tomislav Hengl; r-sig-geo@r-project.org
Subject: [DKIM] Re: [DKIM] Re: [R-sig-Geo] Fw: Why is there a large predictive \
difference for Univ. Kriging? [SEC=UNCLASSIFIED]

Jin,

Is there any to get the variances of the predictions in spm?

________________________________

From: Li Jin <Jin.Li@ga.gov.au<mailto:Jin.Li@ga.gov.au>>
Sent: November 22, 2017 3:28 PM
To: Tomislav Hengl; Joelle k. Akram; \
                r-sig-geo@r-project.org<mailto:r-sig-geo@r-project.org>
Subject: RE: [DKIM] Re: [R-sig-Geo] Fw: Why is there a large predictive difference \
for Univ. Kriging? [SEC=UNCLASSIFIED]

Let try spm and see what we can achieve. All these scripts were directly modified \
from examples in spm.
> library(spm)
> library(sp)
> library(gstat)
> data(meuse)

> set.seed(999)
> rfcv1 <- RFcv(meuse[, c(5,4,7,8)], meuse[, 6], predacc = "ALL") # I used the same \
> predictors in the same order as in your model for comparison purpose. rfcv1$mae
[1] 53.54404 # This much lower than that for KED

> set.seed(999)
> rfcv1 <- rfokcv(meuse[, c(1,2)], meuse[, c(5,4,7,8)], meuse[, 6], predacc = "ALL")
> rfcv1$mae
[1] 42.22274 # This one further improved the accuracy in comparison with that for RF

> set.seed(999)
> rfcv1 <- rfidwcv(meuse[, c(1,2)], meuse[, c(5,4,7,8)], meuse[, 6], predacc = "ALL")
> rfcv1$mae
[1] 42.60406 # This one is similar to RFOK

You may try rfcv1$vecv for each method and see how accurate the models are.

I guess the results speak loudly what should be used.

-----Original Message-----
From: R-sig-Geo [mailto:r-sig-geo-bounces@r-project.org] On Behalf Of Tomislav Hengl
Sent: Thursday, 23 November 2017 8:42 AM
To: Joelle k. Akram; r-sig-geo@r-project.org<mailto:r-sig-geo@r-project.org>
Subject: [DKIM] Re: [R-sig-Geo] Fw: Why is there a large predictive difference for \
Univ. Kriging?

Any type of kriging is a convex predictor which means that predictions at sampling \
locations will exactly match measured numbers. That is why you get MAE_train = 0.

The actual MAE of your predictions is 85.9. This is not that bad considering that the \
range of values is: 113-1839. If your repeat the CV process e.g. 10 times you will \
get a more stable estimate of MAE. Even more interesting is the simple mean error \
(ME) which tells you whether there are over-estimation or under-estimation problem. \
Also plotting observed vs predicted (as in \
http://gsif.isric.org/lib/exe/detail.php/wiki:xyplot_predicted_vs_observerd_edgeroi.png?id=wiki%3Asoilmapping_using_mla)
 gives you graphical idea if there are any problems with your model.

HTH

Tom Hengl

On 2017-11-22 21:34, Joelle k. Akram wrote:
> Hi Tom,
> 
> 
> I tried splitting the data into 'training' set and a 'holdout' sample
> set as in my original post. I seem to be getting consistent results,
> i.e., a large predictive difference in terms of MAE between both sets.
> The MAE_train =0.0000000000001165816 and MAE_holdOut = 85.91126. In my
> opinion, this significant difference is an indication of over-fitting
> on the training sample set for the semi-variogram modeling. The code
> is below.  Any of your insights are welcome.
> 
> 
> demo(meuse, echo=FALSE)
> set.seed(999)
> sel.d = complete.cases(meuse@data[,c("lead","copper","elev",
> "dist")])
> meuse = meuse[sel.d,]
> Training_ids <- sample(seq_len(nrow(meuse)), size = (0.7*
> nrow(meuse)))
> Training_sample = meuse[Training_ids,]
> Holdout_sample = meuse[-Training_ids,]
> # Generate VGM using Training set
> Training_sample.geo <- as.geodata(Training_sample["zinc"])
> ## add covariates:
> Training_sample.geo$covariate =
> Training_sample@data[,c("lead","copper","elev", "dist")] trend = ~
> lead+copper+elev+dist
> zinc.vgm <- likfit(Training_sample.geo, lambda=0, trend = trend,
> 
> ini=c(var(log1p(Training_sample.geo$data)),800),
> fix.psiA = FALSE, fix.psiR = FALSE)
> 
> # do prediction for locations in Training set
> locs2 = Training_sample@coords
> KC = krige.control(trend.d = trend, trend.l = ~
> Training_sample$lead+Training_sample$copper+
> Training_sample$elev+Training_sample$dist,
> obj.model = zinc.vgm)
> zinc_train <- krige.conv(Training_sample.geo, locations=locs2,
> krige=KC)
> # do prediction for new locations in Hold-Out sample set
> newlocs2 = Holdout_sample@coords
> KC2 = krige.control(trend.d = trend, trend.l = ~
> Holdout_sample$lead+Holdout_sample$copper+
> Holdout_sample$elev+Holdout_sample$dist,
> obj.model = zinc.vgm)
> zinc_holdout <- krige.conv(Training_sample.geo, locations=newlocs2,
> krige=KC2)
> # Computing Predictive errors for Training and Hold Out samples
> respectively
> training_prediction_error_term <- Training_sample$zinc -
> zinc_train$predict
> holdout_prediction_error_term <- Holdout_sample$zinc -
> zinc_holdout$predict
> 
> # Function that returns Mean Absolute Error
> mae <- function(error)
> {
> mean(abs(error))
> }
> # Mean Absolute Error metric :
> # UK Predictive errors for Training sample set , and UK Predictive
> Errors for HoldOut sample set
> print(mae(training_prediction_error_term)) #Error for Training
> sample set
> print(mae(holdout_prediction_error_term)) #Error for Hold out sample
> set
> 
> 
> 
> 
> ----------------------------------------------------------------------
> --
> *From:* Tomislav Hengl <tom.hengl@gmail.com<mailto:tom.hengl@gmail.com>>
> *Sent:* November 22, 2017 8:17 AM
> *To:* Joelle k. Akram; r-sig-geo@r-project.org<mailto:r-sig-geo@r-project.org>
> *Subject:* Re: [R-sig-Geo] Fw: Why is there a large predictive
> difference for Univ. Kriging?
> 
> On 2017-11-22 13:11, Joelle k. Akram wrote:
> > 
> > Thank you Tom. A couple questions.
> > 1) In your code, you used log1p for computing  zinc.vgm. But log1p is
> > not used when defining trend.l for the krige.control 'KC'. Do we need
> > log1p for the zinc response (dependent) variable when defining trend.l?
> 
> Log-transformation in linkfit is defined by setting "lambda=0". I know
> it is a very cryptic package but it has all you need - transformation,
> back-transformation, REML fitting of variograms, trend components etc etc.
> 
> > 
> > 2) The exponent back-transform is not applied anywhere after applying
> > log1p for computing the  zinc.vgm variable. Do we need exp anywhere
> > or is it done internally?
> 
> It is done internally.
> 
> > 
> > 3) Do we add half the prediction variance to the 'zinc.uk' variable
> > or does geoR do this internally?
> 
> It is done internally see:
> "krige.conv: performing the Box-Cox data transformation
> krige.conv: back-transforming the predicted mean and variance"
> 
> > 
> > 4) Is it more advisable to use likfit instead of variofit and why?
> 
> likfit has probably more options for variogram modelling (these could
> get quite computational and I think fitting vgms with >>1000 geoR is
> not recommended, while in gstat it is still doable), but it could be a
> matter of taste.
> 
> > 
> > 5) A value of 800 is used to initialize likfit. Where is value determined?
> 
> Arbitrary initial parameter. It does not have to be very accurate.
> 
> > 
> > appreciated,
> > Chris
> > 
> > ---------------------------------------------------------------------
> > ---
> > *From:* R-sig-Geo \
> > <r-sig-geo-bounces@r-project.org<mailto:r-sig-geo-bounces@r-project.org>> on \
> > behalf of Tomislav Hengl <tom.hengl@gmail.com<mailto:tom.hengl@gmail.com>>
> > *Sent:* November 22, 2017 3:58 AM
> > *To:* r-sig-geo@r-project.org<mailto:r-sig-geo@r-project.org>
> > *Subject:* Re: [R-sig-Geo] Fw: Why is there a large predictive
> > difference for Univ. Kriging?
> > 
> > Hi Chris,
> > 
> > First of all, I think your back-transformation is not correct since
> > you need to add half the prediction variance to values as indicated
> > in the Diggle and Ribeiro (2007) P-61
> > (https://github.com/thengl/GeoMLA/blob/master/RF_vs_kriging/Diggle_Ri
> > beiro_2007_P61.png
> 
> > 
> > 
> > ).
> > Otherwise you underpredict the values and hence the cross-validation
> > error will be high.
> > 
> > I also do not see much point in using lead and copper as covariates
> > since they are only available at sampling locations.
> > 
> > For log-normal or not-normal variables I advise using geoR package
> > that does all the transformations for you (it would be interesting to
> > see if gstat and geoR give exactly the same numbers if the same
> > transformations and back-transformations are applied):
> > 
> > R> library(geoR)
> > --------------------------------------------------------------
> > Analysis of Geostatistical Data
> > For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
> > The geoR package - LEG-UFPR <http://www.leg.ufpr.br/geoR>
> > www.leg.ufpr.br<http://www.leg.ufpr.br> <http://www.leg.ufpr.br>  geoR is a free \
> > and open-source package for geostatistical analysis to be  used as a
> > add-on to the R system
> > 
> > 
> > 
> > geoR version 1.7-5.2 (built on 2016-05-02) is now loaded
> > --------------------------------------------------------------
> > 
> > R> demo(meuse, echo=FALSE)
> > R> set.seed(999)
> > R> sel.d = complete.cases(meuse@data[,c("lead","copper","elev",
> > R> "dist")]) meuse = meuse[sel.d,] meuse.geo <-
> > R> as.geodata(meuse["zinc"]) ## add covariates:
> > R> meuse.geo$covariate = meuse@data[,c("lead","copper","elev",
> > R> "dist")] trend = ~ lead+copper+elev+dist zinc.vgm <-
> > R> likfit(meuse.geo, lambda=0, trend = trend,
> > ini=c(var(log1p(meuse.geo$data)),800), fix.psiA = FALSE, fix.psiR =
> > FALSE)
> > ---------------------------------------------------------------
> > likfit: likelihood maximisation using the function optim.
> > likfit: Use control() to pass additional
> > arguments for the maximisation function.
> > For further details see documentation for optim.
> > likfit: It is highly advisable to run this function several
> > times with different initial values for the parameters.
[[elided Hotmail spam]]
> > ---------------------------------------------------------------
> > likfit: end of numerical maximisation.
> > R> zinc.vgm
> > likfit: estimated model parameters:
> > beta0      beta1      beta2      beta3      beta4      tausq
> > sigmasq        phi       psiA  "  6.0853" "  0.0033" "  0.0053" "
> > -0.0810" " -0.9805" "  0.0210" "
> > 0.0717" "799.9942" "  0.2619"
> > psiR
> > "  3.9731"
> > Practical Range with cor=0.05 for asymptotic range: 2396.568
> > 
> > likfit: maximised log-likelihood = -883.4
> > R> locs2 = meuse@coords
> > R> KC = krige.control(trend.d = trend, trend.l = ~
> > meuse$lead+meuse$copper+meuse$elev+meuse$dist, obj.model = zinc.vgm)
> > R> zinc.uk <- krige.conv(meuse.geo, locations=locs2, krige=KC)
> > krige.conv: model with mean defined by covariates provided by the
> > user
> > krige.conv: anisotropy correction performed
> > krige.conv: performing the Box-Cox data transformation
> > krige.conv: back-transforming the predicted mean and variance
> > krige.conv: Kriging performed using global neighbourhood
> > 
> > HTH,
> > 
> > Tom Hengl
> > http://orcid.org/0000-0002-9921-5129
> > Tomislav Hengl (0000-0002-9921-5129) - ORCID | Connecting ...
> > <http://orcid.org/0000-0002-9921-5129>
> > orcid.org
> > Your use of the Registry and the results of your search are subject
> > to ORCID's Terms and Conditions of Use
> > 
> > 
> > 
> > 
> > 
> > On 2017-11-22 01:08, Joelle k. Akram wrote:
> > > 
> > > 
> > > 
> > > down
> > > votefavorite<https://stackoverflow.com/questions/47424740/why-is-pre
<https://stackoverflow.com/questions/47424740/why-is-pre%0b>>>> \
dictive-error-large-for-universal-kriging#>
> 
> > 
> > <https://stackoverflow.com/questions/47424740/why-is-predictive-error
<https://stackoverflow.com/questions/47424740/why-is-predictive-error%0b>>> \
-large-for-universal-kriging#>
> > 
> > Why is Predictive error large for Universal Kriging?
> > <https://stackoverflow.com/questions/47424740/why-is-predictive-error
<https://stackoverflow.com/questions/47424740/why-is-predictive-error%0b>>> \
-large-for-universal-kriging#>
> > stackoverflow.com
> > I am using the Meuse dataset for universal kriging (UK) via the gstat
> > library in R. I am following a strategy used in Machine Learning
> > where data is partioned into a Train set and Hold out set. The...
> > 
> > 
> > 
> > > 
> > > 
> > > I am using the Meuse dataset for universal kriging (UK) via the gstat library \
> > > in R. I am following a strategy used in Machine Learning where data is \
> > > partioned into a Train set and Hold out set. The Train set is used for defining \
> > > the regressive model and  defining  the semivariogram. 
> > > I employ UK to predict on both the Train sample set, as well as the
> > > Hold Out sample set. However, there mean absolute error (MAE) from
> > > the predictions of the response variable (i.e., zinc for the Meuse
> > > dataset) and actual values are very different. I would expect them
> > > to be similar or at least closer. So far I have
> > MAE_training_set = 1 and MAE_holdOut_set = 76.5. My code is below and
> > advice is welcome.
> > > 
> > > library(sp)
> > > library(gstat)
> > > data(meuse)
> > > dataset= meuse
> > > set.seed(999)
> > > 
> > > # Split Meuse Dataset into Training and HoldOut Sample datasets
> > > Training_ids <- sample(seq_len(nrow(dataset)), size = (0.7*
> > > nrow(dataset)))
> > > 
> > > Training_sample = dataset[Training_ids,] Holdout_sample_allvars =
> > > dataset[-Training_ids,]
> > > 
> > > holdoutvars_df <-(dataset[,which(names(dataset) %in%
> > > c("x","y","lead","copper","elev","dist"))])
> > > Hold_out_sample = holdoutvars_df[-Training_ids,]
> > > 
> > > coordinates(Training_sample) <- c('x','y')
> > > coordinates(Hold_out_sample) <- c('x','y')
> > > 
> > > # Semivariogram modeling
> > > m1  <- variogram(log(zinc)~lead+copper+elev+dist, Training_sample) m
> > > <- vgm("Exp") m <- fit.variogram(m1, m)
> > > 
> > > 
> > > # Apply Univ Krig to Training dataset prediction_training_data <-
> > > krige(log(zinc)~lead+copper+elev+dist, Training_sample,
> > > Training_sample, model = m) prediction_training_data <-
> > > expm1(prediction_training_data$var1.pred)
> > > 
> > > # Apply Univ Krig to Hold Out dataset prediction_holdout_data <-
> > > krige(log(zinc)~lead+copper+elev+dist, Training_sample,
> > > Hold_out_sample, model = m) prediction_holdout_data <-
> > > expm1(prediction_holdout_data$var1.pred)
> > > 
> > > # Computing Predictive errors for Training and Hold Out samples
> > > respectively training_prediction_error_term <- Training_sample$zinc
> > > - prediction_training_data holdout_prediction_error_term <-
> > > Holdout_sample_allvars$zinc - prediction_holdout_data
> > > 
> > > 
> > > 
> > > # Function that returns Mean Absolute Error  mae <- function(error)
> > > {
> > > mean(abs(error))
> > > }
> > > 
> > > # Mean Absolute Error metric :
> > > # UK Predictive errors for Training sample set , and UK Predictive
> > > Errors for HoldOut sample set
> > > print(mae(training_prediction_error_term)) #Error for Training
> > > sample set
> > > print(mae(holdout_prediction_error_term)) #Error for Hold out sample
> > > set
> > > 
> > > 
> > > cheers,
> > > 
> > > Kristopher (Chris)
> > > 
> > > [[alternative HTML version deleted]]
> > > 
> > > _______________________________________________
> > > R-sig-Geo mailing list
> > > R-sig-Geo@r-project.org<mailto:R-sig-Geo@r-project.org>
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > R-sig-Geo Info Page - SfS – Seminar for Statistics | ETH ...
> > <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
> > stat.ethz.ch
> > A mailing list for discussing the development and use of R functions
> > and packages for handling and analysis of spatial, and particularly
> > geographical, data.
> > 
> > 
> > 
> > > 
> > 
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo@r-project.org<mailto:R-sig-Geo@r-project.org>
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > R-sig-Geo Info Page - SfS – Seminar for Statistics | ETH ...
> > <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
> > stat.ethz.ch
> > A mailing list for discussing the development and use of R functions
> > and packages for handling and analysis of spatial, and particularly
> > geographical, data.
> > 
> > 
> > 

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org<mailto:R-sig-Geo@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is \
intended only for the person or entity to which it is addressed. If you are not the \
intended recipient, then you have received this e-mail by mistake and any use, \
dissemination, forwarding, printing or copying of this e-mail and its file \
attachments is prohibited. The security of emails transmitted cannot be guaranteed; \
                by forwarding or replying to this email, you acknowledge and accept \
                these risks.
-------------------------------------------------------------------------------------------------------------------------

Geoscience Australia Disclaimer: This e-mail (and files transmitted with it) is \
intended only for the person or entity to which it is addressed. If you are not the \
intended recipient, then you have received this e-mail by mistake and any use, \
dissemination, forwarding, printing or copying of this e-mail and its file \
attachments is prohibited. The security of emails transmitted cannot be guaranteed; \
                by forwarding or replying to this email, you acknowledge and accept \
                these risks.
-------------------------------------------------------------------------------------------------------------------------

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[prev in list] [next in list] [prev in thread] [next in thread]