[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-sig-geo
Subject:    Re: [R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a RasterStack object
From:       Matteo Mattiuzzi <matteo.mattiuzzi () boku ! ac ! at>
Date:       2012-05-29 13:20:02
Message-ID: 4FC4E9220200002700014349 () gwia2 ! boku ! ac ! at
[Download RAW message or body]

Ok, so even 33.8 % faster! :), cheers Matteo

> > > Thiago Veloso <thi_veloso@yahoo.com.br> 5/29/2012 3:03 pm >>>
  Matteo,

  I am very sorry for taking so long to answer - I took some days off starting in the \
same day you sent the message.

  I have tested the alternative you proposed. Indeed rowSums makes all the \
difference, take a look:

> s
class       : RasterStack 
dimensions  : 5568, 8289, 46153152, 45  (nrow, ncol, ncell, nlayers)
resolution  : 0.00898, 0.00898  (x, y)
extent      : -104.4326, -29.99736, -40.00064, 10  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
min values  : -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, \
-32768, -32768, -32768, -32768, -32768, -32768, ...  max values  : 32767, 32767, \
32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, \
32767, ...  layer names : gpp2001001.Gpp_1km, gpp2001009.Gpp_1km, gpp2001017.Gpp_1km, \
gpp2001025.Gpp_1km, gpp2001033.Gpp_1km, gpp2001041.Gpp_1km, gpp2001049.Gpp_1km, \
gpp2001057.Gpp_1km, gpp2001065.Gpp_1km, gpp2001073.Gpp_1km, gpp2001081.Gpp_1km, \
gpp2001089.Gpp_1km, gpp2001097.Gpp_1km, gpp2001105.Gpp_1km, gpp2001113.Gpp_1km, ... 

> fun <- function(x) {
+         x [x > 32760] <- NA
+         x <- x * 0.0001
+         x <- sum(x,na.rm=FALSE)
+ }

> fout  <- paste('/mnt/disco3/MODIS/MOSAIC/GPP/yearly/gpp_2001(sum).nc', sep='')

> system.time(x <- calc(s, fun, filename=fout))
    user   system  elapsed 
1485.619   47.935 3320.638

  Function 'sum' took ~55 minutes to integrate 45 images

> fun <- function(x) {
+         x [x > 32760] <- NA
+         x <- x * 0.0001
+         x <- rowSums(x,na.rm=FALSE)

+ }
> fout  <- paste('/mnt/disco3/MODIS/MOSAIC/GPP/yearly/gpp_2001(rowSums).nc', sep='')
> system.time(x <- calc(s, fun, filename=fout))

    user   system  elapsed 
393.477   79.516 2197.248 

  Same 45 images took ~36 minutes to be integrated using 'rowSums' function!!

  Thanks for this tip - it will certainly speed up my work!

  All the best,
  Thiago.
________________________________
From: Matteo Mattiuzzi <matteo.mattiuzzi@boku.ac.at>
To: Robert J. Hijmans <r.hijmans@gmail.com>; Thiago Veloso <thi_veloso@yahoo.com.br> 
Cc: "r-sig-geo@r-project.org" <r-sig-geo@r-project.org> 
Sent: Tuesday, May 15, 2012 9:28 AM
Subject: Re: [R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a \
RasterStack object


Hi Thiago, have you tested this?

# vectorized version. Here x is a matrix! (1 "chunk" of data, depend on \
blockSize(data).) fun <- function(x) {
        x[x>32761] <- NA
        x <- x * 0.0001
        x <- rowSums(x,na.rm=FALSE)
}
x <- calc(s, fun,forceFun=TRUE)

I'm confident that you will win another 30% of time...let me know I'm curious.

Matteo

> > > Thiago Veloso <thi_veloso@yahoo.com.br> 5/15/2012 1:27 pm >>>
  Robert and Matteo,

  Thanks for your suggestions. I ended up running the old version of my script, \
which, after some crop operations, takes 'only' 40 minutes per annual-summation \
relative to the old 51-minute mark.

  Although your suggestions on memory optimization are really tempting, the server I \
use is shared with my workgroup and I don't want to create a memory black hole!

  For everyone's appreciation, I am pasting the code in the end of this message.

  All the best,
  Thiago.

# loading required packages (raster will do the most of the processing, 

# rgdal is used to load images, ncdf writes the resulting netcdf files
# and maptools is used to load shapefiles which crops the images.)
library(raster)
library(rgdal)
library(ncdf)
library(maptools)

# set the working directory
setwd("/mnt/disco3/MODIS/MOSAIC/GPP")

# load and project shapefiles to crop images later
sa<-readShapePoly("/mnt/disco3/MODIS/shapes/southamerica.shp")
br<-readShapePoly("/mnt/disco3/MODIS/shapes/brazil.shp")
shape_proj<-raster('/mnt/disco3/MODIS/MOSAIC/GPP/gpp2001001.Gpp_1km.tif')
projection(sa)<-projection(shape_proj)
projection(br)<-projection(shape_proj)

# define a function to perform common tasks
fun <- function(x) {
        x [x > 32760] <- NA #remove unvalid range of values (water, unclassified \
                pixes etc): 32760 for GPP and  for LAI
        x <- x * 0.0001 #scale factor for MODIS GPP product: 0.0001 for GPP and 0.1 \
                for LAI
        x <- sum(x,na.rm=FALSE) #integrating 46 satellite images takes 51 minutes \
(3,104 seconds) to run }

# Once the function has been defined, it should be relatively
# straightforward to loop over years to perform annual integration
files <- list.files(pattern='.Gpp_1km.tif',  full.names=TRUE)
years <- substr(files, 6, 9)
for (y in 2001:2011) {
       f <- files[ years == y ]
       # stack one year's all files
       s <- stack(f) #stacking 46 images takes 10 seconds to run
       s <- crop(s,br)
       fout  <- paste('/mnt/disco3/MODIS/MOSAIC/GPP/yearly/gpp_', y,'.nc', sep='')
       x <- calc(s, fun, filename=fout)
}
________________________________
From: Robert J. Hijmans <r.hijmans@gmail.com>
To: thi_veloso@yahoo.com.br
Cc: r-sig-geo@r-project.org; Matteo Mattiuzzi <matteo.mattiuzzi@boku.ac.at>
Sent: Monday, May 14, 2012 1:25 PM
Subject: Re: [R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a \
RasterStack object


Thiago, 

I forgot to mention that you may be able to further speed things up by processing the \
data in bigger chunks, using 

setOptions(chunksize=   )

The default setting is somewhat conservative, to avoid errors from occurring, but if \
you want to live in the fast lane, give it a try.

Also, Matteo pointed out to me that  calc(x, sum)  also uses rowSums; so speed should \
be equivalent for sum and calc. calc has the advantage of being able to set a \
filename. 

Robert


On Sun, May 13, 2012 at 9:03 AM, Robert J. Hijmans <r.hijmans@gmail.com> wrote:

sum should be faster than calc(x, sum) because sum uses rowSums whereas calc, in this \
case, would call apply(values(x), 1, sum). calc(x, function(i) rowSums(i)) should be \
equivalent to sum (in speed and results). 
> 
> 
> Whether making a RasterBrick is worth it depends on how many times you will use \
> these files for this type of computation. If it is only for one computation it \
> should be slower to first make bricks, as the main bottleneck in these computations \
> is disk I/O. I cannot think of another way to speed things up right now --- \
> something we are working through clustering (probably not too helpful here), and C \
> routines. 
> 
> 30 hours is long, but it is the computer that's working, not you.
> 
> Robert
> 
> 
> 
> 
> On Sun, May 13, 2012 at 4:46 AM, Matteo Mattiuzzi <matteo.mattiuzzi@boku.ac.at> \
> wrote: 
> Dear Thiago,
> > 
> > I don't know what makes sum() more suitable comparing to calc() but I think a \
> > fakt that you are useing a stack (and not a brick) influences also the speed of \
> > processing: 
> > ######################
> > 
> > library(raster)
> > r <- raster()
> > r[] <- 1:ncell(r)
> > 
> > setOptions(todisk=TRUE)
> > system.time(st <- stack(r,r,r,r,r,r,r,r,r))
> > system.time(br <- brick(r,r,r,r,r,r,r,r,r))
> > 
> > system.time(sum(st,na.rm=FALSE))
> > system.time(sum(br,na.rm=FALSE))
> > 
> > system.time(calc(st,sum,na.rm=FALSE))
> > system.time(calc(br,sum,na.rm=FALSE))
> > 
> > setOptions(todisk=FALSE)
> > #######################
> > 
> > # Of coarse the creation of a brick takes a bit of time since the data needs to \
> > be re-organised physically on the disk. # But this initial overhead worthwhile \
> > relatively quickly, you have to check this. 
> > # Cheers Matteo
> > 
> > 
> > 
> > 
> > > > > Thiago Veloso  13.05.12 1.53 Uhr >>>
> > 
> > Robert,
> > Thanks for your input. Your suggestion is more suitable to my case, but it takes \
> > almost one hour to sum a stack containing 46 satellite images. Please see below:
> > > sclass       : RasterStack dimensions  : 5568, 8289, 46153152, 46  (nrow, ncol, \
> > > ncell, nlayers)resolution  : 0.00898, 0.00898  (x, y)extent      : -104.4326, \
> > > -29.99736, -40.00064, 10  (xmin, xmax, ymin, ymax)coord. ref. : +proj=longlat \
> > > +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 min values  : -32768, -32768, \
> > > -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, \
> > > -32768, -32768, -32768, ... max values  : 32767, 32767, 32767, 32767, 32767, \
> > > 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, ... layer \
> > > names : gpp2004001.Gpp_1km, gpp2004009.Gpp_1km, gpp2004017.Gpp_1km, \
> > > gpp2004025.Gpp_1km, gpp2004033.Gpp_1km, gpp2004041.Gpp_1km, gpp2004049.Gpp_1km, \
> > > gpp2004057.Gpp_1km, gpp2004065.Gpp_1km, gpp2004073.Gpp_1km, gpp2004081.Gpp_1km, \
> > > gpp2004089.Gpp_1km, gpp2004097.Gpp_1km, gpp2004105.Gpp_1km, gpp2004113.Gpp_1km, \
> > > ...
> > 
> > For reference, total size of the 46 images which compose the stack reaches 4GB. \
> > Here is how long it takes to perform the sum over the stack:
> > > system.time(sos <- sum(s,na.rm=FALSE))    user   system  elapsed  743.098   \
> > > 31.258 3104.113
> > The machine I am using to process this dataset is an 3GHz quadcore with 16GB of \
> > RAM, running Opensuse and latest R (2.15.0). As processing my entire data \
> > involves performing this operation at least 30 times, is there a way to optimize \
> > the process? All the best,  Thiago.
> > 
> > --- On Sat, 12/5/12, Robert J. Hijmans  wrote:
> > 
> > From: Robert J. Hijmans
> > 
> > Subject: Re: [R-sig-Geo] Antw: Error when using rowSums along with a RasterStack \
> >                 object
> > To: thi_veloso@yahoo.com.br
> > Cc: r-sig-geo@r-project.org
> > Date: Saturday, 12 May, 2012, 13:43
> > 
> > You can also do
> > sos <- sum(s, na.rm=FALSE)
> > Best, Robert
> > 
> > 
> > 
> > On Sat, May 12, 2012 at 2:26 AM, Matteo Mattiuzzi  wrote:
> > 
> > Dear Thiago,
> > 
> > 
> > 
> > is this what you need?
> > 
> > ####
> > 
> > library(raster)
> > 
> > r <- raster()
> > 
> > r[]<-1:ncell(r)
> > 
> > r2 <- raster()
> > 
> > r2[]<-ncell(r):1
> > 
> > r <-brick(r,r2)
> > 
> > plot(r)
> > 
> > 
> > 
> > sos <- calc(r,sum)
> > 
> > x11()
> > 
> > plot(sos)
> > 
> > ####
> > 
> > Matteo
> > 
> > 
> > 
> > sos <- calc(r,sum)
> > 
> > plot(sos)
> > 
> > 
> > 
> > > > > Thiago Veloso  11.05.12 23.29 Uhr >>>
> > 
> > Dear R-colleagues,
> > 
> > 
> > 
> > I am trying to sum all layers of a RasterStack object, which is summarized below:
> > 
> > 
> > 
> > > s
> > 
> > class       : RasterStack
> > 
> > dimensions  : 5568, 8289, 46153152, 46  (nrow, ncol, ncell, nlayers)
> > 
> > resolution  : 0.00898, 0.00898  (x, y)
> > 
> > extent      : -104.4326, -29.99736, -40.00064, 10  (xmin, xmax, ymin, ymax)
> > 
> > coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
> > 
> > min values  : -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, \
> > -32768, -32768, -32768, -32768, -32768, -32768, -32768, ... 
> > max values  : 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, \
> > 32767, 32767, 32767, 32767, 32767, 32767, ... 
> > layer names : gpp2004001.Gpp_1km, gpp2004009.Gpp_1km, gpp2004017.Gpp_1km, \
> > gpp2004025.Gpp_1km, gpp2004033.Gpp_1km, gpp2004041.Gpp_1km, gpp2004049.Gpp_1km, \
> > gpp2004057.Gpp_1km, gpp2004065.Gpp_1km, gpp2004073.Gpp_1km, gpp2004081.Gpp_1km, \
> > gpp2004089.Gpp_1km, gpp2004097.Gpp_1km, gpp2004105.Gpp_1km, gpp2004113.Gpp_1km, \
> > ... 
> > 
> > 
> > 
> > The raster 's' is composed by 46 satellite images at a resolution of 1km \
> > (0.00898degree). Each layer is a 8-day composition of leaf area index and I need \
> > to integrate an entire year of LAI. When I try to run the "rowSums" function in \
> > this RasterStack, the following error is displayed: 
> > 
> > 
> > 
> > > sos <- rowSums(s,na.rm=FALSE)
> > 
> > Error in rowSums(s, na.rm = FALSE) :
> > 
> > 'x' must be an array of at least two dimensions
> > 
> > 
> > 
> > The same occurs when I try to use .rowSums providing matrix dimensions:
> > 
> > 
> > 
> > > sos <- .rowSums(s,5568,8289)
> > 
> > Error in .rowSums(s, 5568, 8289) : 'x' must be numeric
> > 
> > 
> > 
> > 
> > 
> > What am I doing wrong? Is there another way to integrate the layers of a \
> > RasterStack object? 
> > 
> > 
> > Thanks in advance,
> > 
> > Thiago Veloso.
> > 
> > 
> > 
> > P.S.: Attaching sessionInfo output to provide useful information:
> > 
> > 
> > 
> > > sessionInfo()
> > 
> > R version 2.15.0 (2012-03-30)
> > 
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> > 
> > 
> > 
> > locale:
> > 
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > 
> > [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> > 
> > [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> > 
> > [7] LC_PAPER=C                 LC_NAME=C
> > 
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > 
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > 
> > 
> > 
> > attached base packages:
> > 
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> > 
> > 
> > 
> > other attached packages:
> > 
> > [1] ncdf_1.6.6    rgdal_0.7-11  raster_1.9-92 sp_0.9-99
> > 
> > 
> > 
> > loaded via a namespace (and not attached):
> > 
> > [1] grid_2.15.0    lattice_0.20-6
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > 
> > R-sig-Geo mailing list
> > 
> > R-sig-Geo@r-project.org
> > 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > 
> > 
> > 
> > 
> > 
> > [[alternative HTML version deleted]]
> > 
> > 
> > 
> > _______________________________________________
> > 
> > R-sig-Geo mailing list
> > 
> > R-sig-Geo@r-project.org
> > 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > 
> > 
> > 
> > 
> > 
   [[alternative HTML version deleted]]
> > 
> > 
> > 
> > 
> > [[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> > 
> 

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic