[prev in list] [next in list] [prev in thread] [next in thread] 

List:       jakarta-commons-dev
Subject:    [jira] Commented: (MATH-224) Utility method to aggregate Statistics
From:       "Phil Steitz (JIRA)" <jira () apache ! org>
Date:       2008-11-29 21:16:46
Message-ID: 218049686.1227993406651.JavaMail.jira () brutus
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/MATH-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651755#action_12651755 \
] 

Phil Steitz commented on MATH-224:
----------------------------------

After looking carefully at the patch and the higher moments problem, I am now leaning \
toward WONTFIX for this.  The problem is that the "storeless" statistics are only \
required to store enough data to support updates via single value increments.  Adding \
the requirement to support aggregation in the sense defined here places an \
unacceptable limitation on the implementing classes.  The second moment and variance, \
for example, only work now because the default implementations carry along nested \
first moments.  This setup is a little awkward and might be changed; but then it \
would not be obvious how to support aggregation.  It is not obvious to me how to \
support this for fourth moments at all.  In any case, I think it is too restrictive a \
requirement to place on implementations, so if we do support this, it should be via a \
subclass of SummaryStatistics.

> Utility method to aggregate Statistics
> --------------------------------------
> 
> Key: MATH-224
> URL: https://issues.apache.org/jira/browse/MATH-224
> Project: Commons Math
> Issue Type: Improvement
> Reporter: Andre Panisson
> Assignee: Phil Steitz
> Priority: Minor
> Fix For: 2.0
> 
> Attachments: commons_math.patch
> 
> 
> Below is the conversation related to this topic that was posted to the Commons \
>                 Users group.
> -------------------------------------------------
> Hi,
> > 
> > I'm writing a complex validation algorithm, that makes a K-Fold
> > cross-validation using a data set. The data set is partitioned into K
> > subsamples, and of the K subsamples, a single subsample is retained
> > as the validation data for testing, and the remaining K − 1
> > subsamples are used as training data. The process is then repeated K
> > times, and at the end the K results are aggregated to a single
> > result. The problem is that all K results return Statistics objects
> > (org.apache.commons.math.stat.descriptive.SummaryStatistics), and I
> > need to make the aggregation of all K objects in a single Statistics.
> > I think it is a common problem in the statistics field. There's
> > anyone who had already implemented an utility method to do it?
> There is no such feature currently in commons-math. The
> SummaryStatistics class wraps a bunch of specialized statistics classes
> (Sum, Mean, Max, SumOfSquares ...) which can be overriden by
> user-provided StorelessUnivariateStatistic implementations.
> So this feature should be added to the StorelessUnivariateStatistic
> interface and all its implementations, with a signature like this:
> public void aggregate(StorelessUnivariateStatistic otherStatistic);
> The implementation of this method should only use the
> StorelessUnivariateStatistic methods, i.e. getResult() and getN(). This
> seems feasible for the statistics used by SummaryStatistics, but has not
> been done yet.
> One should be aware that SummaryStatistics does not enforce strong
> typing, so one could call aggregate on a Sum instance and provide it a
> Min instance, which would of course result in meaningless results.
> > Or maybe it would be interesting to request it as an Improvement to
> > the Commons Math developers, adding an "aggregator" to all Statistics
> > implementations?
> If you want to request this improvement, please open a ticket for it
> using our JIRA tracking system:
> http://issues.apache.org/jira/browse/MATH. You'll have to register to be
> able to add your feature request. You can also provide a patch if you
> want to contribute it by yourself.
> Luc
> > 
> > Thanks in advance,
> > 
> > Andre Panisson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic