[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-help
Subject:    Re: [R] Archive format
From:       Joe Gain <joe.gain () uni-konstanz ! de>
Date:       2017-03-30 8:14:51
Message-ID: a211e8d5-5261-80f9-c04b-2d278dee5ebf () uni-konstanz ! de
[Download RAW message or body]

On 29.03.2017 17:36, Jeff Newmiller wrote:
> The relevance to R (and therefore R-help) of this question is marginal at best. R \
> might not be the language of choice when you go retrieve the data. 
> Also, this question seems dangerously close to a troll, because the obvious answer \
> is that the data should be in an open format but if you are not currently working \
> with data in an open format then you increase the cost of archiving and risk losing \
> information up front by extracting it from a proprietary format, and balancing \
> those concerns is more political than technical. 
> Note that there exist open binary formats, and the goals of your archiving task and \
> nature of the data would have to be considered in deciding which of the many to \
> use. My own experience has been that plain text survives time best, but YMMV. 

Well, I didn't mean to troll the list. We have a small section on R, and 
in response to a question that we got from a user, we thought it would 
be a good idea to check with some actual R-users.

I think the responses are pretty much in line with what we expected. 
There's unsurprisingly no simple solution. A text format is advantageous 
due to the many options that a user has to work with text data. Your 
point is valid, with regards to the format of the source-data, which can 
be a clear constraint (other constraints are, for example, of a legal 
nature). I'm not trying to advocate for open formats per se, just trying 
to gather information so as to be able to make a recommendation.

I think we need to restructure the information on our web platform to 
clearly differentiate between data and the source code, scripts etc. 
which are used to process the data ("algorithms").

There is a big problem with data that has been archived but nobody knows 
what it is/ was for. Archivation, sharing, reproducibility are important 
subjects and we are interested in the experience of statisticians in 
dealing with these problems.

Thanks for the replies!
Joe

-- 
B 1003
Kommunikations-, Informations-, Medienzentrum (KIM)
Universitaet Konstanz

t: ++49-7531-883234
e: joe.gain@uni-konstanz.de

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic