[prev in list] [next in list] [prev in thread] [next in thread] 

List:       jakarta-commons-dev
Subject:    [jira] [Created] (MATH-1310) Improve accuracy and performance of 2-sample Kolmogorov-Smirnov test
From:       "Phil Steitz (JIRA)" <jira () apache ! org>
Date:       2015-12-31 20:04:39
Message-ID: JIRA.12925144.1451592220000.167.1451592279779 () Atlassian ! JIRA
[Download RAW message or body]

Phil Steitz created MATH-1310:
---------------------------------

             Summary: Improve accuracy and performance of 2-sample Kolmogorov-Smirnov \
test  Key: MATH-1310
                 URL: https://issues.apache.org/jira/browse/MATH-1310
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.5
            Reporter: Phil Steitz
             Fix For: 3.6


As of 3.5, the exactP method used to compute exact  p-values for 2-sample \
Kolmogorov-Smirnov tests is very slow, as it is based on a naive implementation that \
enumarates all n-m partitions of the combined sample.  As a result, its use is not \
recommended for problems where the product of the two sample sizes exceeds 100 and \
the kolmogorovSmirnovTest method uses it only for samples in this range.  To handle \
sample size products between 100 and 10000, where the asymptotic KS distribution can \
be used, this method currently uses Monte Carlo simulation.  Convergence is poor for \
many problem instances, resulting in inaccurate results.

To eliminate the need for the Monte Carlo simulation and increase the performance of \
exactP itself, a faster exactP implementation should be added.  This can be \
implemented by unwinding the recursive functions defined in Chapter 5, table 5.2 in:

Wilcox, Rand. 2012. Introduction to Robust Estimation and Hypothesis Testing, Chapter \
5, 3rd Ed. Academic Press.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic