[prev in list] [next in list] [prev in thread] [next in thread]
List: r-devel
Subject: Re: [Rd] Which function can change RNG state?
From: Paul Gilbert <pgilbert902 () gmail ! com>
Date: 2015-02-09 5:03:11
Message-ID: 54D83F8F.2020605 () gmail ! com
[Download RAW message or body]
On 02/08/2015 09:33 AM, Dirk Eddelbuettel wrote:
>
> On 7 February 2015 at 19:52, otoomet wrote:
> | random numbers. For instance, can I be sure that
> | set.seed(0); print(runif(1)); print(rnorm(1))
> | will always print the same numbers, also in the future version of R? T=
here
>
> Yes, pretty much.
This is nearly correct. The user could change the uniform or normal =
generator, since there are options other than the defaults, which would =
mean the result would be different. And obviously if they changed print =
precision then the printed result may be truncated differently.
I think you could prepare for future versions of R by saving information =
about the generators you are using. The precedent has already been set =
(R-1.7.0) that the default could change if there is a good reason. A =
good reason might be that the RNG is found not to be so good relative to =
others that become available. But I think the old generator would =
continue to be available, so people can reproduce old results. (Package =
setRNG has some utilities to help save and reset, but there is nothing =
especially difficult or fancy, just a few details that need to be =
remembered.)
>
> I've been lurking here over fifteen years, and while I am getting old and
> forgetful I can remember exactly one such change where behaviour was chan=
ged,
> and (one of the) generators was altered---if memory serves in the earlier
> days of R 1.* days . [ Goes digging...] Yes, see `help(RNGkind)` which
> details that R 1.7.0 made a change when "Buggy Kinderman-Ramage" was adde=
d as
> the old value, and "Kinderman-Ramage" was repaired. There once was a sim=
ilar
> fix in the very early days of the Mersenne-Twister which is why the GNU G=
SL
> has two variants with suffixes _1998 and _1998.
I seem to recall a bit of change around R-0.49 but old and forgetful =
would cover this too. For me, a bigger change was an unadvertised change =
in Splus - they compiled against a different math library at some point. =
This changed the lower bits in results, mostly insignificant but =
accumulated simulation results could amount to something fairly =
important. The amount of time I spent trying to find why results would =
not reproduce was one of my main motivations for starting to use R.
>
> So your issue seems like pilot error to me: don't attach the parallel pa=
ckage
> if you do not plan to work in parallel. But "do if you do", and see its =
fine
> vignette on how it provides you reproducibility for multiple RNG streams.
>
> In general, you can very much trust R (and R Core) in these matters.
>
> Dirk
On 02/08/2015 09:40 AM, G=E1bor Cs=E1rdi wrote:> On Sat, Feb 7, 2015 at
> I don't know if there is intention to keep this reproducible across R
> versions, but it is already not reproducible across platforms (with
>the same R version):
> =
http://stackoverflow.com/questions/21212326/floating-point-arithmetic-and-r=
eproducibility
The situation is better in some respects, and worse in others, than what =
is described on stackoverflow. I think the point is made pretty well =
there that you should not be trying to reproduce results beyond machine =
precision. My experience is that you can compare within a fuzz of 1e-14 =
usually, even across platforms. (The package setRNG on CRAN has a =
function random.number.test() which is run in the package's tests/ and =
makes uniform and normal comparisons to 1e-14. It has passed checks on =
all R platforms since 2004. Actual, the checks have been done since =
about 1995 but they were part of package dse earlier.) If you =
accumulate lots of lower order parts (eg sum(simulated - true) in a long =
monte-carlo) then the fuzz may need to get much larger, especially =
comparing across platforms. And you will have trouble with numerically =
unstable calculations. Once-upon-a-time I was annoyed by this, but then =
I realized that it was better not to do unstable calculations.
In addition to not being reproducible beyond machine precision across R =
versions and across platforms, you can really not be guaranteed even on =
the same platform and same version of R. You may get different results =
if you upgrade the OS and there has been a change in the math libraries. =
In my experience this happens rather often. I don't think there is any =
specific 32 vs 64 bit issue, but math libraries sometimes do things a =
bit differently on different processors (eg processor bug fixes) so you =
can occasionally get differences with everything the same except the =
hardware.
On 02/07/2015 10:52 PM, otoomet wrote:
> It turned out that this is because package "parallel", buried deep
> in my dependencies, calls runif() during it's initialization and
> in this way changes the random number sequence.
Guessing a bit about what you are saying: 1/you set the random seed =
2/you did some things which included loading package parallel 3/you ran =
some things for which you expected to get results comparable to some =
previous run when you did 1/ and 2/ in the reverse order.
If I understand this correctly, I suggest you always do everything =
exactly the same after you set the seed. There are lots of things that =
could generate random numbers without you really knowing. Thus, it is =
usually better to set the seed immediately before you start doing =
anything where you want the seed to have a known state. (There is an =
even better suggestion in the somewhat dated vignette with package setRNG.)
Finally, if you do intend to use parallel sometimes then you have =
additional considerations. You would like to get the same results no =
matter how many machines you are using. This may place some constraints =
on the generators you use, not all are equally easy to use in parallel. =
So if you are hoping to get the same results in parallel as you get on a =
single machine then you better start out using generators on the single =
machine that you will be able to use in parallel.
Paul
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic