[prev in list] [next in list] [prev in thread] [next in thread]
List: r-sig-geo
Subject: Re: [R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a RasterStack object
From: Matteo Mattiuzzi <matteo.mattiuzzi () boku ! ac ! at>
Date: 2012-05-29 13:20:02
Message-ID: 4FC4E9220200002700014349 () gwia2 ! boku ! ac ! at
[Download RAW message or body]
Ok, so even 33.8 % faster! :), cheers Matteo
> > > Thiago Veloso <thi_veloso@yahoo.com.br> 5/29/2012 3:03 pm >>>
Matteo,
I am very sorry for taking so long to answer - I took some days off starting in the \
same day you sent the message.
I have tested the alternative you proposed. Indeed rowSums makes all the \
difference, take a look:
> s
class : RasterStack
dimensions : 5568, 8289, 46153152, 45 (nrow, ncol, ncell, nlayers)
resolution : 0.00898, 0.00898 (x, y)
extent : -104.4326, -29.99736, -40.00064, 10 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
min values : -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, \
-32768, -32768, -32768, -32768, -32768, -32768, ... max values : 32767, 32767, \
32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, \
32767, ... layer names : gpp2001001.Gpp_1km, gpp2001009.Gpp_1km, gpp2001017.Gpp_1km, \
gpp2001025.Gpp_1km, gpp2001033.Gpp_1km, gpp2001041.Gpp_1km, gpp2001049.Gpp_1km, \
gpp2001057.Gpp_1km, gpp2001065.Gpp_1km, gpp2001073.Gpp_1km, gpp2001081.Gpp_1km, \
gpp2001089.Gpp_1km, gpp2001097.Gpp_1km, gpp2001105.Gpp_1km, gpp2001113.Gpp_1km, ...
> fun <- function(x) {
+ x [x > 32760] <- NA
+ x <- x * 0.0001
+ x <- sum(x,na.rm=FALSE)
+ }
> fout <- paste('/mnt/disco3/MODIS/MOSAIC/GPP/yearly/gpp_2001(sum).nc', sep='')
> system.time(x <- calc(s, fun, filename=fout))
user system elapsed
1485.619 47.935 3320.638
Function 'sum' took ~55 minutes to integrate 45 images
> fun <- function(x) {
+ x [x > 32760] <- NA
+ x <- x * 0.0001
+ x <- rowSums(x,na.rm=FALSE)
+ }
> fout <- paste('/mnt/disco3/MODIS/MOSAIC/GPP/yearly/gpp_2001(rowSums).nc', sep='')
> system.time(x <- calc(s, fun, filename=fout))
user system elapsed
393.477 79.516 2197.248
Same 45 images took ~36 minutes to be integrated using 'rowSums' function!!
Thanks for this tip - it will certainly speed up my work!
All the best,
Thiago.
________________________________
From: Matteo Mattiuzzi <matteo.mattiuzzi@boku.ac.at>
To: Robert J. Hijmans <r.hijmans@gmail.com>; Thiago Veloso <thi_veloso@yahoo.com.br>
Cc: "r-sig-geo@r-project.org" <r-sig-geo@r-project.org>
Sent: Tuesday, May 15, 2012 9:28 AM
Subject: Re: [R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a \
RasterStack object
Hi Thiago, have you tested this?
# vectorized version. Here x is a matrix! (1 "chunk" of data, depend on \
blockSize(data).) fun <- function(x) {
x[x>32761] <- NA
x <- x * 0.0001
x <- rowSums(x,na.rm=FALSE)
}
x <- calc(s, fun,forceFun=TRUE)
I'm confident that you will win another 30% of time...let me know I'm curious.
Matteo
> > > Thiago Veloso <thi_veloso@yahoo.com.br> 5/15/2012 1:27 pm >>>
Robert and Matteo,
Thanks for your suggestions. I ended up running the old version of my script, \
which, after some crop operations, takes 'only' 40 minutes per annual-summation \
relative to the old 51-minute mark.
Although your suggestions on memory optimization are really tempting, the server I \
use is shared with my workgroup and I don't want to create a memory black hole!
For everyone's appreciation, I am pasting the code in the end of this message.
All the best,
Thiago.
# loading required packages (raster will do the most of the processing,
# rgdal is used to load images, ncdf writes the resulting netcdf files
# and maptools is used to load shapefiles which crops the images.)
library(raster)
library(rgdal)
library(ncdf)
library(maptools)
# set the working directory
setwd("/mnt/disco3/MODIS/MOSAIC/GPP")
# load and project shapefiles to crop images later
sa<-readShapePoly("/mnt/disco3/MODIS/shapes/southamerica.shp")
br<-readShapePoly("/mnt/disco3/MODIS/shapes/brazil.shp")
shape_proj<-raster('/mnt/disco3/MODIS/MOSAIC/GPP/gpp2001001.Gpp_1km.tif')
projection(sa)<-projection(shape_proj)
projection(br)<-projection(shape_proj)
# define a function to perform common tasks
fun <- function(x) {
x [x > 32760] <- NA #remove unvalid range of values (water, unclassified \
pixes etc): 32760 for GPP and for LAI
x <- x * 0.0001 #scale factor for MODIS GPP product: 0.0001 for GPP and 0.1 \
for LAI
x <- sum(x,na.rm=FALSE) #integrating 46 satellite images takes 51 minutes \
(3,104 seconds) to run }
# Once the function has been defined, it should be relatively
# straightforward to loop over years to perform annual integration
files <- list.files(pattern='.Gpp_1km.tif', full.names=TRUE)
years <- substr(files, 6, 9)
for (y in 2001:2011) {
f <- files[ years == y ]
# stack one year's all files
s <- stack(f) #stacking 46 images takes 10 seconds to run
s <- crop(s,br)
fout <- paste('/mnt/disco3/MODIS/MOSAIC/GPP/yearly/gpp_', y,'.nc', sep='')
x <- calc(s, fun, filename=fout)
}
________________________________
From: Robert J. Hijmans <r.hijmans@gmail.com>
To: thi_veloso@yahoo.com.br
Cc: r-sig-geo@r-project.org; Matteo Mattiuzzi <matteo.mattiuzzi@boku.ac.at>
Sent: Monday, May 14, 2012 1:25 PM
Subject: Re: [R-sig-Geo] Antw: Re: Antw: Error when using rowSums along with a \
RasterStack object
Thiago,
I forgot to mention that you may be able to further speed things up by processing the \
data in bigger chunks, using
setOptions(chunksize= )
The default setting is somewhat conservative, to avoid errors from occurring, but if \
you want to live in the fast lane, give it a try.
Also, Matteo pointed out to me that calc(x, sum) also uses rowSums; so speed should \
be equivalent for sum and calc. calc has the advantage of being able to set a \
filename.
Robert
On Sun, May 13, 2012 at 9:03 AM, Robert J. Hijmans <r.hijmans@gmail.com> wrote:
sum should be faster than calc(x, sum) because sum uses rowSums whereas calc, in this \
case, would call apply(values(x), 1, sum). calc(x, function(i) rowSums(i)) should be \
equivalent to sum (in speed and results).
>
>
> Whether making a RasterBrick is worth it depends on how many times you will use \
> these files for this type of computation. If it is only for one computation it \
> should be slower to first make bricks, as the main bottleneck in these computations \
> is disk I/O. I cannot think of another way to speed things up right now --- \
> something we are working through clustering (probably not too helpful here), and C \
> routines.
>
> 30 hours is long, but it is the computer that's working, not you.
>
> Robert
>
>
>
>
> On Sun, May 13, 2012 at 4:46 AM, Matteo Mattiuzzi <matteo.mattiuzzi@boku.ac.at> \
> wrote:
> Dear Thiago,
> >
> > I don't know what makes sum() more suitable comparing to calc() but I think a \
> > fakt that you are useing a stack (and not a brick) influences also the speed of \
> > processing:
> > ######################
> >
> > library(raster)
> > r <- raster()
> > r[] <- 1:ncell(r)
> >
> > setOptions(todisk=TRUE)
> > system.time(st <- stack(r,r,r,r,r,r,r,r,r))
> > system.time(br <- brick(r,r,r,r,r,r,r,r,r))
> >
> > system.time(sum(st,na.rm=FALSE))
> > system.time(sum(br,na.rm=FALSE))
> >
> > system.time(calc(st,sum,na.rm=FALSE))
> > system.time(calc(br,sum,na.rm=FALSE))
> >
> > setOptions(todisk=FALSE)
> > #######################
> >
> > # Of coarse the creation of a brick takes a bit of time since the data needs to \
> > be re-organised physically on the disk. # But this initial overhead worthwhile \
> > relatively quickly, you have to check this.
> > # Cheers Matteo
> >
> >
> >
> >
> > > > > Thiago Veloso 13.05.12 1.53 Uhr >>>
> >
> > Robert,
> > Thanks for your input. Your suggestion is more suitable to my case, but it takes \
> > almost one hour to sum a stack containing 46 satellite images. Please see below:
> > > sclass : RasterStack dimensions : 5568, 8289, 46153152, 46 (nrow, ncol, \
> > > ncell, nlayers)resolution : 0.00898, 0.00898 (x, y)extent : -104.4326, \
> > > -29.99736, -40.00064, 10 (xmin, xmax, ymin, ymax)coord. ref. : +proj=longlat \
> > > +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 min values : -32768, -32768, \
> > > -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, \
> > > -32768, -32768, -32768, ... max values : 32767, 32767, 32767, 32767, 32767, \
> > > 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, ... layer \
> > > names : gpp2004001.Gpp_1km, gpp2004009.Gpp_1km, gpp2004017.Gpp_1km, \
> > > gpp2004025.Gpp_1km, gpp2004033.Gpp_1km, gpp2004041.Gpp_1km, gpp2004049.Gpp_1km, \
> > > gpp2004057.Gpp_1km, gpp2004065.Gpp_1km, gpp2004073.Gpp_1km, gpp2004081.Gpp_1km, \
> > > gpp2004089.Gpp_1km, gpp2004097.Gpp_1km, gpp2004105.Gpp_1km, gpp2004113.Gpp_1km, \
> > > ...
> >
> > For reference, total size of the 46 images which compose the stack reaches 4GB. \
> > Here is how long it takes to perform the sum over the stack:
> > > system.time(sos <- sum(s,na.rm=FALSE)) user system elapsed 743.098 \
> > > 31.258 3104.113
> > The machine I am using to process this dataset is an 3GHz quadcore with 16GB of \
> > RAM, running Opensuse and latest R (2.15.0). As processing my entire data \
> > involves performing this operation at least 30 times, is there a way to optimize \
> > the process? All the best, Thiago.
> >
> > --- On Sat, 12/5/12, Robert J. Hijmans wrote:
> >
> > From: Robert J. Hijmans
> >
> > Subject: Re: [R-sig-Geo] Antw: Error when using rowSums along with a RasterStack \
> > object
> > To: thi_veloso@yahoo.com.br
> > Cc: r-sig-geo@r-project.org
> > Date: Saturday, 12 May, 2012, 13:43
> >
> > You can also do
> > sos <- sum(s, na.rm=FALSE)
> > Best, Robert
> >
> >
> >
> > On Sat, May 12, 2012 at 2:26 AM, Matteo Mattiuzzi wrote:
> >
> > Dear Thiago,
> >
> >
> >
> > is this what you need?
> >
> > ####
> >
> > library(raster)
> >
> > r <- raster()
> >
> > r[]<-1:ncell(r)
> >
> > r2 <- raster()
> >
> > r2[]<-ncell(r):1
> >
> > r <-brick(r,r2)
> >
> > plot(r)
> >
> >
> >
> > sos <- calc(r,sum)
> >
> > x11()
> >
> > plot(sos)
> >
> > ####
> >
> > Matteo
> >
> >
> >
> > sos <- calc(r,sum)
> >
> > plot(sos)
> >
> >
> >
> > > > > Thiago Veloso 11.05.12 23.29 Uhr >>>
> >
> > Dear R-colleagues,
> >
> >
> >
> > I am trying to sum all layers of a RasterStack object, which is summarized below:
> >
> >
> >
> > > s
> >
> > class : RasterStack
> >
> > dimensions : 5568, 8289, 46153152, 46 (nrow, ncol, ncell, nlayers)
> >
> > resolution : 0.00898, 0.00898 (x, y)
> >
> > extent : -104.4326, -29.99736, -40.00064, 10 (xmin, xmax, ymin, ymax)
> >
> > coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
> >
> > min values : -32768, -32768, -32768, -32768, -32768, -32768, -32768, -32768, \
> > -32768, -32768, -32768, -32768, -32768, -32768, -32768, ...
> > max values : 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, 32767, \
> > 32767, 32767, 32767, 32767, 32767, 32767, ...
> > layer names : gpp2004001.Gpp_1km, gpp2004009.Gpp_1km, gpp2004017.Gpp_1km, \
> > gpp2004025.Gpp_1km, gpp2004033.Gpp_1km, gpp2004041.Gpp_1km, gpp2004049.Gpp_1km, \
> > gpp2004057.Gpp_1km, gpp2004065.Gpp_1km, gpp2004073.Gpp_1km, gpp2004081.Gpp_1km, \
> > gpp2004089.Gpp_1km, gpp2004097.Gpp_1km, gpp2004105.Gpp_1km, gpp2004113.Gpp_1km, \
> > ...
> >
> >
> >
> > The raster 's' is composed by 46 satellite images at a resolution of 1km \
> > (0.00898degree). Each layer is a 8-day composition of leaf area index and I need \
> > to integrate an entire year of LAI. When I try to run the "rowSums" function in \
> > this RasterStack, the following error is displayed:
> >
> >
> >
> > > sos <- rowSums(s,na.rm=FALSE)
> >
> > Error in rowSums(s, na.rm = FALSE) :
> >
> > 'x' must be an array of at least two dimensions
> >
> >
> >
> > The same occurs when I try to use .rowSums providing matrix dimensions:
> >
> >
> >
> > > sos <- .rowSums(s,5568,8289)
> >
> > Error in .rowSums(s, 5568, 8289) : 'x' must be numeric
> >
> >
> >
> >
> >
> > What am I doing wrong? Is there another way to integrate the layers of a \
> > RasterStack object?
> >
> >
> > Thanks in advance,
> >
> > Thiago Veloso.
> >
> >
> >
> > P.S.: Attaching sessionInfo output to provide useful information:
> >
> >
> >
> > > sessionInfo()
> >
> > R version 2.15.0 (2012-03-30)
> >
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> >
> >
> > locale:
> >
> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> >
> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> >
> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> >
> > [7] LC_PAPER=C LC_NAME=C
> >
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> >
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> >
> >
> > attached base packages:
> >
> > [1] stats graphics grDevices utils datasets methods base
> >
> >
> >
> > other attached packages:
> >
> > [1] ncdf_1.6.6 rgdal_0.7-11 raster_1.9-92 sp_0.9-99
> >
> >
> >
> > loaded via a namespace (and not attached):
> >
> > [1] grid_2.15.0 lattice_0.20-6
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> > R-sig-Geo mailing list
> >
> > R-sig-Geo@r-project.org
> >
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> >
> >
> > _______________________________________________
> >
> > R-sig-Geo mailing list
> >
> > R-sig-Geo@r-project.org
> >
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
> >
> >
> >
[[alternative HTML version deleted]]
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
[[alternative HTML version deleted]]
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic