3

I have trouble using the Kolmogorov-Smirnov test in scipy (scipy.stats.kstest). the online doc (http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html) says it requires the sample, the cdf for comparison with the option of just naming one of the scipy.stats distributions, the cdf arguments (and several optional values)

as long as the chosen cdf does not require any additional arguments, all appears fine

teststat,pval=stats.kstest(sample,'norm')

(where sample is a list of values.) However, with other distributions that require additional agruments, such as t, chisquared etc. it won't work for me. it protests correctly if no further arguments are given

teststat,pval=stats.kstest(sample,'t')

TypeError: _cdf() takes exactly 3 arguments (2 given)

if an argument is given,

teststat,pval=stats.kstest(sample,'t',24)

it complains

TypeError: cdf() argument after * must be a sequence, not int

Now I'm not exactly sure what that means but it seems that it wants not int, 24, but a sequence of one int, (24). However:

teststat,pval=stats.kstest(sample,'t',24)

TypeError: cdf() argument after * must be a sequence, not int

Defining the distribution manually does not yield better results either because it does not feel that this is callable:

numargs = stats.t.numargs
[ df ] = [0.9,] * numargs
rv = stats.t(df)
teststat,pval=stats.kstest(sample,stats.t.cdf(numpy.linspace(0, numpy.minimum(rv.dist.b, 3)),df))

TypeError: 'numpy.ndarray' object is not callable

What do I do to make it work? (Google search for either the kstest function or the various error messages do not turn up anything useful to answer this question.)

Thanks

4

1 回答 1

2

Looking at this error:

TypeError: cdf() argument after * must be a sequence, not int

makes me think that you're right, and it wants a sequence, not an integer. The docs say

args : tuple, sequence
    distribution parameters, used if rvs or cdf are strings

Which seems to work:

>>> import scipy.stats
>>> sample = scipy.stats.t(1).rvs(size=10**6)
>>> scipy.stats.kstest(sample, 't', (1,))
(0.0006249662221899932, 0.82960203415652445)

or more explicitly:

>>> scipy.stats.kstest(sample, 't', args=(1,))
(0.0006249662221899932, 0.82960203415652445)
于 2012-08-27T04:46:51.137 回答