[prev in list] [next in list] [prev in thread] [next in thread]
List: r-devel
Subject: [Rd] order(..., na.last = NA) performance hit
From: Murat Tasan <mmuurr () gmail ! com>
Date: 2015-01-19 20:20:31
Message-ID: CA+YV+HyRjMbB+A4wRxCm1woZSHmxCQerShLm10ynYgVeSqXHdg () mail ! gmail ! com
[Download RAW message or body]
I've just recently noticed that using the na.last = NA setting with
order incurs a HUGE performance hit.
It appears that much of order(...) (the R wrapper, not the internal
calls) is written in as general a manner as possible to handle the
large number of input types.
But the canonical case of ordering a single vector of numerics suffers
greatly with the current implementation.
Below is a single trivial example, but overall I've been noticing
somewhere on the order of a 10X performance hit when using na.last =
NA.
Would it be worth (i) attempting a re-write of the wrapping order(...)
function, or (ii) at least mentioning the performance implications in
the help page for order(...)?
Here's an example of the performance hit:
x <- runif(1e6)
x[runif(1e6) > 0.9] <- NA ## add some (~10%) NA values
order2 <- function(x) {
iix <- order(x, na.last = TRUE)
iix[!is.na(x[iix])]
}
system.time(y1 <- order(x, na.last = TRUE))
## user system elapsed
## 0.48 0.00 0.48
system.time(y2 <- order(x, na.last = NA))
## user system elapsed
## 3.060 0.056 3.118
system.time(y3 <- order2(x))
## user system elapsed
## 0.520 0.004 0.520
all(y2 == y3)
## [1] TRUE
identical(y2, y3)
## [1] TRUE
Cheers,
-murat
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic