'Re: [R] Adding NA values in random positions in a dataframe'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-help
Subject:    Re: [R] Adding NA values in random positions in a dataframe
From:       Bert Gunter <gunter.berton () gene ! com>
Date:       2013-11-29 21:00:04
Message-ID: CACk-te1_xxp0ONW7ZE3PXYyX0QFUxm7ds6ye8FOrZ1yNDpeO8g () mail ! gmail ! com
[Download RAW message or body]

An essentially identical approach that may be a tad clearer -- but
requires additional space -- first creates a logical vector for the
locations of the NA's in the unlisted data.frame. Further NA positions
are randomly added and then the augmented vector is used as a logical
matrix to index where the NA's should go in the data frame:

df <- data.frame(a = c(1:3,NA,4:6),
                b=c(letters[1:6],NA),
                 c= c(1,NA,runif(5)))

nr <- nrow(df); nc <- ncol(df)
p <- .3 ## desired total proportion of NA's

ina <- is.na(unlist(df)) ## logical vector, TRUE corresponds to NA positions
n2 <- floor(p*nr*nc) - sum(ina)  ## number of new NA's

ina[sample(which(!is.na(ina)), n2)] <- TRUE
df[matrix(ina, nr=nr,nc=nc)]<- NA ## using matrix indexing

df

Cheers,
Bert

On Fri, Nov 29, 2013 at 10:09 AM, arun <smartpink111@yahoo.com> wrote:
> Hi,
> I used that because 10% of the values in the data were already NA.
> 
> 
> You are right.  Sorry, ?match() is unnecessary.  I was trying another solution with \
> match() which didn't work out and forgot to check whether it was adequate or not. \
> set.seed(49) dat1[!is.na(dat1)][sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20))] \
> <- NA A.K.
> 
> 
> Thanks for the reply. I don't get the 0.20 multiplied by the length of the non NA \
> value, where did you take it from? 
> Furthermore, why do we have to use the function match? Wouldn't it be enough to use \
> the saple function? 
> 
> On Thursday, November 28, 2013 12:57 PM, arun <smartpink111@yahoo.com> wrote:
> Hi,
> One way would be:
> set.seed(42)
> dat1 <- as.data.frame(matrix(sample(c(1:5,NA),50,replace=TRUE,prob=c(10,15,15,20,30,10)),ncol=5))
>  set.seed(49)
> dat1[!is.na(dat1)][ match( \
> sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20)),seq(dat1[!is.na(dat1)]))] \
> <- NA length(dat1[is.na(dat1)])/length(unlist(dat1))
> #[1] 0.28
> 
> A.K.
> 
> 
> Hello, I'm quite new at R so I don't know which is the most efficient
> way to execute a function that I could write easily in other languages.
> 
> This is my problem: I have a dataframe with a certain numbers of
> NA (approximately 10%). I want to add other NA values in random
> positions of the dataframes until reaching an overall proportions of NA
> values of 30% (clearly the positions with NA values don't have to
> change). I tried looking at iterative function in R as apply or sapply
> but I can't actually figure out how to use them in this case. Thank you.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[prev in list] [next in list] [prev in thread] [next in thread]