'Re: [Rd] Which function can change RNG state?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-devel
Subject:    Re: [Rd] Which function can change RNG state?
From:       Paul Gilbert <pgilbert902 () gmail ! com>
Date:       2015-02-09 5:03:11
Message-ID: 54D83F8F.2020605 () gmail ! com
[Download RAW message or body]

On 02/08/2015 09:33 AM, Dirk Eddelbuettel wrote:
>
> On 7 February 2015 at 19:52, otoomet wrote:
> | random numbers.   For instance, can I be sure that
> | set.seed(0); print(runif(1)); print(rnorm(1))
> | will always print the same numbers, also in the future version of R?  T=
here
>
> Yes, pretty much.

This is nearly correct. The user could change the uniform or normal =

generator, since there are options other than the defaults, which would =

mean the result would be different. And obviously if they changed print =

precision then the printed result may be truncated differently.

I think you could prepare for future versions of R by saving information =

about the generators you are using. The precedent has already been set =

(R-1.7.0) that the default could change if there is a good reason. A =

good reason might be that the RNG is found not to be so good relative to =

others that become available. But I think the old generator would =

continue to be available, so people can reproduce old results. (Package =

setRNG has some utilities to help save and reset, but there is nothing =

especially difficult or fancy, just a few details that need to be =

remembered.)
>
> I've been lurking here over fifteen years, and while I am getting old and
> forgetful I can remember exactly one such change where behaviour was chan=
ged,
> and (one of the) generators was altered---if memory serves in the earlier
> days of R 1.* days . [ Goes digging...] Yes, see `help(RNGkind)` which
> details that R 1.7.0 made a change when "Buggy Kinderman-Ramage" was adde=
d as
> the old value, and "Kinderman-Ramage" was repaired.  There once was a sim=
ilar
> fix in the very early days of the Mersenne-Twister which is why the GNU G=
SL
> has two variants with suffixes _1998 and _1998.

I seem to recall a bit of change around R-0.49 but old and forgetful =

would cover this too. For me, a bigger change was an unadvertised change =

in Splus - they compiled against a different math library at some point. =

This changed the lower bits in results, mostly insignificant but =

accumulated simulation results could amount to something fairly =

important. The amount of time I spent trying to find why results would =

not reproduce was one of my main motivations for starting to use R.
>
> So your issue seems like pilot error to me:  don't attach the parallel pa=
ckage
> if you do not plan to work in parallel.  But "do if you do", and see its =
fine
> vignette on how it provides you reproducibility for multiple RNG streams.
>
> In general, you can very much trust R (and R Core) in these matters.
>
> Dirk

On 02/08/2015 09:40 AM, G=E1bor Cs=E1rdi wrote:> On Sat, Feb 7, 2015 at
 > I don't know if there is intention to keep this reproducible across R
 > versions, but it is already not reproducible across platforms (with
 >the same R version):
 > =

http://stackoverflow.com/questions/21212326/floating-point-arithmetic-and-r=
eproducibility

The situation is better in some respects, and worse in others, than what =

is described on stackoverflow. I think the point is made pretty well =

there that you should not be trying to reproduce results beyond machine =

precision. My experience is that you can compare within a fuzz of 1e-14 =

usually, even across platforms. (The package setRNG on CRAN has a =

function random.number.test() which is run in the package's tests/ and =

makes uniform and normal comparisons to 1e-14. It has passed checks on =

all R platforms since 2004. Actual, the checks have been done since =

about 1995 but they were part of package dse earlier.)  If you =

accumulate lots of lower order parts (eg sum(simulated - true) in a long =

monte-carlo) then the fuzz may need to get much larger, especially =

comparing across platforms. And you will have trouble with numerically =

unstable calculations. Once-upon-a-time I was annoyed by this, but then =

I realized that it was better not to do unstable calculations.

In addition to not being reproducible beyond machine precision across R =

versions and across platforms, you can really not be guaranteed even on =

the same platform and same version of R. You may get different results =

if you upgrade the OS and there has been a change in the math libraries. =

In my experience this happens rather often. I don't think there is any =

specific 32 vs 64 bit issue, but math libraries sometimes do things a =

bit differently on different processors (eg processor bug fixes) so you =

can occasionally get differences with everything the same except the =

hardware.

On 02/07/2015 10:52 PM, otoomet wrote:
 > It turned out that this is because package "parallel", buried deep
 > in my dependencies, calls runif() during it's initialization and
 > in this way changes the random number sequence.

Guessing a bit about what you are saying: 1/you set the random seed =

2/you did some things which included loading package parallel 3/you ran =

some things for which you expected to get results comparable to some =

previous run when you did 1/ and 2/ in the reverse order.

If I understand this correctly, I suggest you always do everything =

exactly the same after you set the seed. There are lots of things that =

could generate random numbers without you really knowing. Thus, it is =

usually better to set the seed immediately before you start doing =

anything where you want the seed to have a known state. (There is an =

even better suggestion in the somewhat dated vignette with package setRNG.)

Finally, if you do intend to use parallel sometimes then you have =

additional considerations. You would like to get the same results no =

matter how many machines you are using. This may place some constraints =

on the generators you use, not all are equally easy to use in parallel. =

So if you are hoping to get the same results in parallel as you get on a =

single machine then you better start out using generators on the single =

machine that you will be able to use in parallel.

Paul

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
[prev in list] [next in list] [prev in thread] [next in thread]