[prev in list] [next in list] [prev in thread] [next in thread]
List: jakarta-commons-dev
Subject: [jira] [Created] (MATH-1310) Improve accuracy and performance of 2-sample Kolmogorov-Smirnov test
From: "Phil Steitz (JIRA)" <jira () apache ! org>
Date: 2015-12-31 20:04:39
Message-ID: JIRA.12925144.1451592220000.167.1451592279779 () Atlassian ! JIRA
[Download RAW message or body]
Phil Steitz created MATH-1310:
---------------------------------
Summary: Improve accuracy and performance of 2-sample Kolmogorov-Smirnov \
test Key: MATH-1310
URL: https://issues.apache.org/jira/browse/MATH-1310
Project: Commons Math
Issue Type: Bug
Affects Versions: 3.5
Reporter: Phil Steitz
Fix For: 3.6
As of 3.5, the exactP method used to compute exact p-values for 2-sample \
Kolmogorov-Smirnov tests is very slow, as it is based on a naive implementation that \
enumarates all n-m partitions of the combined sample. As a result, its use is not \
recommended for problems where the product of the two sample sizes exceeds 100 and \
the kolmogorovSmirnovTest method uses it only for samples in this range. To handle \
sample size products between 100 and 10000, where the asymptotic KS distribution can \
be used, this method currently uses Monte Carlo simulation. Convergence is poor for \
many problem instances, resulting in inaccurate results.
To eliminate the need for the Monte Carlo simulation and increase the performance of \
exactP itself, a faster exactP implementation should be added. This can be \
implemented by unwinding the recursive functions defined in Chapter 5, table 5.2 in:
Wilcox, Rand. 2012. Introduction to Robust Estimation and Hypothesis Testing, Chapter \
5, 3rd Ed. Academic Press.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic