Hi all, I would like to aggregate a large data file that is defined by a number of factors and associated values. The point is that not all factor level combinations are present in the data file -- these "missing" values are in fact to be treated as zeroes. Is there a straightforward way to a) either expand the existing data set so that the missing factor combinations can be added, or b) an "aggregate" function that generates a row of data for all given factor combinations. Here is an example: a) "complete" data set: > example <- data.frame(f1=factor(rep(LETTERS[1:3],each=4)),f2=factor(letters[1:2]),d=1:12) > aggregate(cbind(d=example$d),by=list(f1=example$f1,f2=example$f2),sum) f1 f2 d 1 A a 4 2 B a 12 3 C a 20 4 A b 6 5 B b 14 6 C b 22 b) data set with "missing combinations": > example2 <- example[c(-10,-12),] > aggregate(cbind(d=example2$d),by=list(f1=example2$f1,f2=example2$f2),sum) f1 f2 d 1 A a 4 2 B a 12 3 C a 20 4 A b 6 5 B b 14 Here, I would like to have the missing row width f1=C, f2=b, d=NA. The solution I have come up with is very slow and cumbersome (because there a re many factors) and I am convinced that there is a better way to do this (I create a new data frame with all factor combinations present and then copy the results from the call to aggregate line by line into the new data frame). Thanks for your help Pascal ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html