[prev in list] [next in list] [prev in thread] [next in thread]
List: r-help
Subject: [R] aggregating data and missing values
From: "Pascal A. Niklaus" <Pascal.Niklaus () unibas ! ch>
Date: 2005-11-02 12:59:55
Message-ID: 200511021359.55327.Pascal.Niklaus () unibas ! ch
[Download RAW message or body]
Hi all,
I would like to aggregate a large data file that is defined by a number of
factors and associated values. The point is that not all factor level
combinations are present in the data file -- these "missing" values are in
fact to be treated as zeroes.
Is there a straightforward way to
a) either expand the existing data set so that the missing factor combinations
can be added, or
b) an "aggregate" function that generates a row of data for all given factor
combinations.
Here is an example:
a) "complete" data set:
> example <-
data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12)
> aggregate(cbind(d=example$d),by=list(f1=example$f1,f2=example$f2),sum)
f1 f2 d
1 A a 4
2 B a 12
3 C a 20
4 A b 6
5 B b 14
6 C b 22
b) data set with "missing combinations":
> example2 <- example[c(-10,-12),]
> aggregate(cbind(d=example2$d),by=list(f1=example2$f1,f2=example2$f2),sum)
f1 f2 d
1 A a 4
2 B a 12
3 C a 20
4 A b 6
5 B b 14
Here, I would like to have the missing row width f1=C, f2=b, d=NA.
The solution I have come up with is very slow and cumbersome (because there a
re many factors) and I am convinced that there is a better way to do this (I
create a new data frame with all factor combinations present and then copy
the results from the call to aggregate line by line into the new data frame).
Thanks for your help
Pascal
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic