statsmodels.stats.gof.powerdiscrepancy

statsmodels.stats.gof.powerdiscrepancy(observed, expected, lambd=0.0, axis=0, ddof=0)[source]

计算幂差异,作为衡量观测数据与期望数据之间差异的拟合优度检验类别。

这包含了几种拟合优度检验作为特例,参见lambd的描述,即幂差异指数。p值基于检验统计量的渐近卡方分布。

freeman_tukey: D(x|theta) = sum_j (sqrt{x_j} - sqrt{e_j})^2

Parameters:
oIterable

观测值

eIterable

预期值

lambd{float, str}
  • 浮点数 : 指数 a 用于功率差异

  • ‘loglikeratio’: a = 0

  • ‘freeman_tukey’: a = -0.5

  • ‘pearson’: a = 1 (标准卡方检验统计量)

  • ‘modified_loglikeratio’: a = -1

  • ‘cressie_read’: a = 2/3

  • ‘neyman’ : a = -2 (Neyman-modified 卡方检验, 参考自一本书?)

axisint

一个序列的观测值的轴

ddofint

自由度修正,

Returns:
D_obsDiscrepancy of observed values
pvaluepvalue

参考文献

Cressie, Noel and Timothy R. C. Read, Multinomial Goodness-of-Fit Tests,

皇家统计学会志. B辑 (方法论), 第46卷, 第3期 (1984年), 第440-464页

Campbell B. Read: Freeman-Tukey chi-squared goodness-of-fit statistics,

统计与概率快报 18 (1993) 271-278

Nobuhiro Taneichi, Yuri Sekiya, Akio Suzukawa, Asymptotic Approximations

对于多项式拟合优度统计量在局部备择假设下的分布,《多元分析杂志》81, 335-359 (2002)

Steele, M. 1,2, C. Hurst 3 and J. Chaseling, Simulated Power of Discrete

适合度检验用于Likert类型数据

示例

>>> observed = np.array([ 2.,  4.,  2.,  1.,  1.])
>>> expected = np.array([ 0.2,  0.2,  0.2,  0.2,  0.2])

用于检查多个序列的正确维度

>>> powerdiscrepancy(np.column_stack((observed,observed)).T, 10*expected, lambd='freeman_tukey',axis=1)
(array([[ 2.745166,  2.745166]]), array([[ 0.6013346,  0.6013346]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T, 10*expected,axis=1)
(array([[ 2.77258872,  2.77258872]]), array([[ 0.59657359,  0.59657359]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T, 10*expected, lambd=0,axis=1)
(array([[ 2.77258872,  2.77258872]]), array([[ 0.59657359,  0.59657359]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T, 10*expected, lambd=1,axis=1)
(array([[ 3.,  3.]]), array([[ 0.5578254,  0.5578254]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T, 10*expected, lambd=2/3.0,axis=1)
(array([[ 2.89714546,  2.89714546]]), array([[ 0.57518277,  0.57518277]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)).T, expected, lambd=2/3.0,axis=1)
(array([[ 2.89714546,  2.89714546]]), array([[ 0.57518277,  0.57518277]]))
>>> powerdiscrepancy(np.column_stack((observed,observed)), expected, lambd=2/3.0, axis=0)
(array([[ 2.89714546,  2.89714546]]), array([[ 0.57518277,  0.57518277]]))

每个随机变量可以有不同的总数/总和

>>> powerdiscrepancy(np.column_stack((observed,2*observed)), expected, lambd=2/3.0, axis=0)
(array([[ 2.89714546,  5.79429093]]), array([[ 0.57518277,  0.21504648]]))
>>> powerdiscrepancy(np.column_stack((observed,2*observed)), expected, lambd=2/3.0, axis=0)
(array([[ 2.89714546,  5.79429093]]), array([[ 0.57518277,  0.21504648]]))
>>> powerdiscrepancy(np.column_stack((2*observed,2*observed)), expected, lambd=2/3.0, axis=0)
(array([[ 5.79429093,  5.79429093]]), array([[ 0.21504648,  0.21504648]]))
>>> powerdiscrepancy(np.column_stack((2*observed,2*observed)), 20*expected, lambd=2/3.0, axis=0)
(array([[ 5.79429093,  5.79429093]]), array([[ 0.21504648,  0.21504648]]))
>>> powerdiscrepancy(np.column_stack((observed,2*observed)), np.column_stack((10*expected,20*expected)), lambd=2/3.0, axis=0)
(array([[ 2.89714546,  5.79429093]]), array([[ 0.57518277,  0.21504648]]))
>>> powerdiscrepancy(np.column_stack((observed,2*observed)), np.column_stack((10*expected,20*expected)), lambd=-1, axis=0)
(array([[ 2.77258872,  5.54517744]]), array([[ 0.59657359,  0.2357868 ]]))

Last update: Oct 16, 2024