[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-sig-geo
Subject:    Re: [R-sig-Geo] Odd behavior of dismo's extract function
From:       Nick Matzke <matzke () nimbios ! org>
Date:       2016-07-25 2:35:04
Message-ID: CAJdu7BBs=qMhs+DH860XRLprDR1MB3FY0+Mko7BSaVgvLdtMOw () mail ! gmail ! com
[Download RAW message or body]

I am on R 3.3.1 and don't get the huge time difference.  There is a tiny
decrease though from 250 to 251:

=============
wd = "~/Downloads/extract_weirdness/"
setwd(wd)

library(raster)
library(dismo)


extract.test <- function(env, N){
      extract(env, randomPoints(env, N))
}

env.files <- list.files(path = ".", pattern = "pc", full.names =
TRUE)
env <- stack(env.files)


system.time(extract.test(env, 250))

   user  system elapsed
  1.455   0.043   1.492
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude


system.time(extract.test(env, 251))   user  system elapsed
  1.137   0.033   1.158
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude
=============

...but I won't worry about it myself.  Thanks for the solution though, that
was weird behavior!
Nick




On Mon, Jul 25, 2016 at 12:29 PM, Dan Warren <dan.l.warren@gmail.com> wrote:

> Updating to R 3.3.1 fixed it.  Thanks!  Still baffled as to why the sudden
> dropoff between 250 and 251, but as long as it's working all is well.
> 
> Cheers!
> 
> 
> On Mon, Jul 25, 2016 at 12:24 PM, Dan Warren <dan.l.warren@gmail.com>
> wrote:
> 
> > How very odd.  I'm using R 3.3.0, but as far as I can tell I'm using the
> > same package versions as you.  I've tried this on two machines (12 core
> Mac
> > Pro and an older Macbook Pro) and I'm getting the same phenomenon on
> both.
> > Could it be a weird OSX thing?  I'll try updating R and then if it still
> > persists I'll bootcamp over into Windows and see if it's happening for me
> > there.
> > 
> > 
> > My session info (sorry for not including that the first time):
> > 
> > Session info
> > 
> ----------------------------------------------------------------------------------------------------------------------------------------
> 
> > setting  value
> > version  R version 3.3.0 (2016-05-03)
> > system   x86_64, darwin13.4.0
> > ui       RStudio (0.99.491)
> > language (EN)
> > collate  en_AU.UTF-8
> > tz       Australia/Sydney
> > date     2016-07-25
> > 
> > Packages
> > 
> --------------------------------------------------------------------------------------------------------------------------------------------
> 
> > package    * version date       source
> > colorspace   1.2-6   2015-03-11 CRAN (R 3.3.0)
> > devtools     1.12.0  2016-06-24 CRAN (R 3.3.0)
> > digest       0.6.9   2016-01-08 CRAN (R 3.3.0)
> > dismo      * 1.1-1   2016-06-16 CRAN (R 3.3.0)
> > ENMTools   * 0.1     2016-07-25 local
> > ggplot2    * 2.1.0   2016-03-01 CRAN (R 3.3.0)
> > gridExtra  * 2.2.1   2016-02-29 CRAN (R 3.3.0)
> > gtable       0.2.0   2016-02-26 CRAN (R 3.3.0)
> > highr        0.6     2016-05-09 CRAN (R 3.3.0)
> > knitr      * 1.13    2016-05-09 CRAN (R 3.3.0)
> > lattice      0.20-33 2015-07-14 CRAN (R 3.3.0)
> > memoise      1.0.0   2016-01-29 CRAN (R 3.3.0)
> > munsell      0.4.3   2016-02-13 CRAN (R 3.3.0)
> > plyr       * 1.8.4   2016-06-08 CRAN (R 3.3.0)
> > raster     * 2.5-8   2016-06-02 CRAN (R 3.3.0)
> > Rcpp         0.12.5  2016-05-14 CRAN (R 3.3.0)
> > rgeos      * 0.3-19  2016-04-04 CRAN (R 3.3.0)
> > rJava        0.9-8   2016-01-07 CRAN (R 3.3.0)
> > scales       0.4.0   2016-02-26 CRAN (R 3.3.0)
> > sp         * 1.2-3   2016-04-14 CRAN (R 3.3.0)
> > viridis    * 0.3.4   2016-03-12 CRAN (R 3.3.0)
> > withr        1.0.2   2016-06-20 CRAN (R 3.3.0)
> > 
> > 
> > On Mon, Jul 25, 2016 at 12:15 PM, Michael Sumner <mdsumner@gmail.com>
> > wrote:
> > 
> > > 
> > > 
> > > On Mon, 25 Jul 2016 at 11:35 Dan Warren <dan.l.warren@gmail.com> wrote:
> > > 
> > > > Just realized I pasted in the results backwards.  It should have been
> > > > 
> > > > system.time(extract.test(env, 250))
> > > > 
> > > > user  system elapsed
> > > > 124.562   0.516 125.061
> > > > 
> > > > system.time(extract.test(env, 251))
> > > > 
> > > > user  system elapsed
> > > > 2.807   0.084   2.891
> > > > 
> > > > 
> > > > 
> > > > 
> > > I don't see the effect.
> > > 
> > > Perhaps it was fixed in recent version of raster?
> > > 
> > > Please post reproducible details, I downloaded your data files to
> > > "test/testdata/" to try this.
> > > 
> > > Cheers, Mike.
> > > 
> > > 
> > > library(raster)
> > > library(dismo)
> > > extract.test <- function(env, N){
> > > extract(env, dismo::randomPoints(env, N))
> > > }
> > > 
> > > env.files <- list.files(path = "test/testdata/", pattern = "pc",
> > > full.names =
> > > TRUE)
> > > env <- raster::stack(env.files)
> > > 
> > > library(rbenchmark)
> > > benchmark(n250 = extract.test(env, 250),
> > > n251 = extract.test(env, 251), replications = 4)
> > > # test replications elapsed relative user.self sys.self user.child
> > > sys.child
> > > # 1 n250            4    6.31    1.008      5.13     1.14         NA
> > > NA
> > > # 2 n251            4    6.26    1.000      5.02     1.22         NA
> > > NA
> > > devtools::session_info()
> > > # Session info
> > > 
> -------------------------------------------------------------------------------------------------------------------------------
> 
> > > #   setting  value
> > > # version  R version 3.3.1 Patched (2016-07-09 r70874)
> > > # system   x86_64, mingw32
> > > # ui       RStudio (0.99.1261)
> > > # language (EN)
> > > # collate  English_Australia.1252
> > > # tz       Australia/Hobart
> > > # date     2016-07-25
> > > #
> > > # Packages
> > > 
> -----------------------------------------------------------------------------------------------------------------------------------
> 
> > > #   package    * version date       source
> > > # devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.1)
> > > # digest       0.6.9   2016-01-08 CRAN (R 3.3.1)
> > > # dismo      * 1.1-1   2016-06-16 CRAN (R 3.3.1)
> > > # evaluate     0.9     2016-04-29 CRAN (R 3.3.1)
> > > # htmltools    0.3.5   2016-03-21 CRAN (R 3.3.1)
> > > # knitr        1.13    2016-05-09 CRAN (R 3.3.1)
> > > # lattice      0.20-33 2015-07-14 CRAN (R 3.3.1)
> > > # magrittr     1.5     2014-11-22 CRAN (R 3.3.1)
> > > # memoise      1.0.0   2016-01-29 CRAN (R 3.3.1)
> > > # raster     * 2.5-8   2016-06-02 CRAN (R 3.3.1)
> > > # rbenchmark * 1.0.0   2012-08-30 CRAN (R 3.3.0)
> > > # Rcpp         0.12.5  2016-05-14 CRAN (R 3.3.1)
> > > # rgdal        1.1-10  2016-05-12 CRAN (R 3.3.1)
> > > # rmarkdown    1.0.2   2016-07-19 Github (rstudio/rmarkdown@b65e177)
> > > # sp         * 1.2-3   2016-04-14 CRAN (R 3.3.1)
> > > # stringi      1.1.1   2016-05-27 CRAN (R 3.3.0)
> > > # stringr      1.0.0   2015-04-30 CRAN (R 3.3.1)
> > > # withr        1.0.2   2016-06-20 CRAN (R 3.3.1)
> > > 
> > > 
> > > 
> > > 
> > > 
> > > > Dan Warren, Ph.D.
> > > > Department of Biology
> > > > Macquarie University
> > > > Email: dan.warren@mq.edu.au <dan.warren@anu.edu.au>
> > > > Phone (US): 530-848-3809
> > > > Phone (Australia): 0468 696 897
> > > > Phone (Work): 02 9850 8587
> > > > Skype: dan.l.warren
> > > > Google Scholar
> > > > <https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en>  Orcid
> > > > <http://orcid.org/0000-0002-8747-2451>  ResearcherID
> > > > <http://www.researcherid.com/rid/B-3821-2010>  Scopus
> > > > <http://www.scopus.com/authid/detail.url?authorId=7202133982>
> > > > 
> > > > 
> > > > On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren@gmail.com>
> > > > wrote:
> > > > 
> > > > > This is not an error per se so much as just something very weird
> that I
> > > > > have noticed with a project I've been working on recently.  I'm
> > > > wondering
> > > > > if anyone here has any insight as to what may be causing this
> > > > behavior.  I
> > > > > haven't yet been able to duplicate it with simulated rasters (more
> > > > info on
> > > > > that below), but it appears very reliably with real environmental
> data
> > > > > including the PC rasters for Cuba I have hosted here:
> > > > > 
> > > > > https://github.com/danlwarren/ENMTools/tree/master/test/testdata
> > > > > 
> > > > > What's happening is this: if I go to extract data from those rasters
> > > > using
> > > > > occurrence points, the amount of time it takes increases very rapidly
> > > > up to
> > > > > exactly 250 points, and falls dramatically after that.  So
> dramatically
> > > > > that it takes over two minutes to extract data for 250 points but
> just
> > > > > under three seconds for 251.  I've established that it's not a
> > > > question of
> > > > > the points themselves being wonky, because it happens with random
> > > > points as
> > > > > well.
> > > > > 
> > > > > 
> > > > > extract.test <- function(env, N){
> > > > > extract(env, randomPoints(env, N))
> > > > > }
> > > > > 
> > > > > env.files <- list.files(path = "testdata/", pattern = "pc",
> full.names
> > > > =
> > > > > TRUE)
> > > > > env <- stack(env.files)
> > > > > 
> > > > > system.time(extract.test(env, 250))
> > > > > 
> > > > > user  system elapsed
> > > > > 2.807   0.084   2.891
> > > > > 
> > > > > system.time(extract.test(env, 251))
> > > > > 
> > > > > user  system elapsed
> > > > > 124.562   0.516 125.061
> > > > > 
> > > > > numpoints,time
> > > > > 1,1.54
> > > > > 5,3.93
> > > > > 10,6.764
> > > > > 50,29.939
> > > > > 100,61.431
> > > > > 150,79.295
> > > > > 200,110.283
> > > > > 250,120.118
> > > > > 251,2.748
> > > > > 252,2.756
> > > > > 254,2.767
> > > > > 500,2.876
> > > > > 1000,3.153
> > > > > 
> > > > > The data being extracted looks perfectly reasonable in all cases.
> It's
> > > > > not just these layers, either.  Although (as I mentioned above) I
> have
> > > > yet
> > > > > to come up with simulated rasters that show this behavior, I see this
> > > > > behavior for both of the sets of rasters for real environmental data
> > > > that
> > > > > I've tried.  The results above are from a PCA on Worldclim data for
> > > > Cuba,
> > > > > but I just tried them on some Climond data I've got for Australia and
> > > > I get
> > > > > the same behavior.  Those rasters are much larger, though, and a
> > > > result the
> > > > > times are longer; 251 points took about 43 seconds, whereas I just
> had
> > > > to
> > > > > give up and stop the 250 point extraction after about 30 minutes.
> > > > > 
> > > > > As for those simulated rasters, I've tried the following:
> > > > > 
> > > > > Plain grids of sequential numbers
> > > > > As above, but with a bunch of NAs added
> > > > > Filling the Cuban rasters with sequential numbers
> > > > > Filling the Cuban rasters with random numbers from a uniform (0,1)
> > > > > distribution
> > > > > 
> > > > > None of those show this issue.  Anyone have any thoughts about what
> > > > might
> > > > > be going on here?
> > > > > 
> > > > > 
> > > > 
> > > > [[alternative HTML version deleted]]
> > > > 
> > > > _______________________________________________
> > > > R-sig-Geo mailing list
> > > > R-sig-Geo@r-project.org
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > > > 
> > > --
> > > Dr. Michael Sumner
> > > Software and Database Engineer
> > > Australian Antarctic Division
> > > 203 Channel Highway
> > > Kingston Tasmania 7050 Australia
> > > 
> > > 
> > 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic