版本 0.16.2 (2015年6月12日)#

这是从0.16.1版本的一个小错误修复版本，包括大量错误修复以及一些新功能（pipe() 方法）、增强功能和性能改进。

我们建议所有用户升级到此版本。

亮点包括：

一个新的 pipe 方法，见这里
关于如何使用 numba 与 pandas 的文档，请参见这里

新功能#

Pipe#

我们引入了一个新方法 DataFrame.pipe()。顾名思义，pipe 应该用于将数据通过一系列函数调用进行管道传输。目的是避免令人困惑的嵌套函数调用，例如

# df is a DataFrame
# f, g, and h are functions that take and return DataFrames
f(g(h(df), arg1=1), arg2=2, arg3=3)  # noqa F821

逻辑从内到外流动，函数名与其关键字参数分开。这可以重写为

(
    df.pipe(h)  # noqa F821
    .pipe(g, arg1=1)  # noqa F821
    .pipe(f, arg2=2, arg3=3)  # noqa F821
)

现在代码和逻辑从上到下流动。关键字参数紧邻其函数。总体上代码更具可读性。

在上面的例子中，函数 f、g 和 h 都期望 DataFrame 作为第一个位置参数。当你希望应用的函数接受的数据不在第一个参数时，传递一个 (函数, 关键字) 的元组，指示 DataFrame 应该流向哪里。例如：

In [1]: import statsmodels.formula.api as sm

In [2]: bb = pd.read_csv("data/baseball.csv", index_col="id")

# sm.ols takes (formula, data)
In [3]: (
...:     bb.query("h > 0")
...:     .assign(ln_h=lambda df: np.log(df.h))
...:     .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
...:     .fit()
...:     .summary()
...: )
...:
Out[3]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                            OLS Regression Results
==============================================================================
Dep. Variable:                     hr   R-squared:                       0.685
Model:                            OLS   Adj. R-squared:                  0.665
Method:                 Least Squares   F-statistic:                     34.28
Date:                Tue, 22 Nov 2022   Prob (F-statistic):           3.48e-15
Time:                        05:35:23   Log-Likelihood:                -205.92
No. Observations:                  68   AIC:                             421.8
Df Residuals:                      63   BIC:                             432.9
Df Model:                           4
Covariance Type:            nonrobust
===============================================================================
                coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept   -8484.7720   4664.146     -1.819      0.074   -1.78e+04     835.780
C(lg)[T.NL]    -2.2736      1.325     -1.716      0.091      -4.922       0.375
ln_h           -1.3542      0.875     -1.547      0.127      -3.103       0.395
year            4.2277      2.324      1.819      0.074      -0.417       8.872
g               0.1841      0.029      6.258      0.000       0.125       0.243
==============================================================================
Omnibus:                       10.875   Durbin-Watson:                   1.999
Prob(Omnibus):                  0.004   Jarque-Bera (JB):               17.298
Skew:                           0.537   Prob(JB):                     0.000175
Kurtosis:                       5.225   Cond. No.                     1.49e+07
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.49e+07. This might indicate that there are
strong multicollinearity or other numerical problems.
"""

管道方法受到 Unix 管道的启发，这些管道通过进程传输文本。最近，dplyr 和 magrittr 为 R 引入了流行的 (%>%) 管道运算符。

查看更多信息请参见文档。 (GH 10129)

其他增强功能#

在 Index/Series StringMethods 中添加了 rsplit (GH 10303)
在 IPython notebook 中移除了 DataFrame HTML 表示的硬编码大小限制，并将此交给 IPython 本身处理（仅适用于 IPython v3.0 或更高版本）。这消除了在 notebook 中显示大帧时出现的重复滚动条（GH 10231）。

请注意，笔记本有一个 toggle output scrolling 功能，用于限制显示非常大的帧（通过点击输出左侧）。您还可以使用 pandas 选项配置 DataFrames 的显示方式，请参见此处这里。
DataFrame.quantile 的 axis 参数现在也接受 index 和 column。 (GH 9543)

API 变化#

Holiday 现在如果在构造函数中同时使用 offset 和 observance 会引发 NotImplementedError，而不是返回错误的结果 (GH 10217)。

性能提升#

通过 dtype=datetime64[ns] 提高了 Series.resample 的性能 (GH 7754)
当 expand=True 时，提高 str.split 的性能 (GH 10081)

错误修复#

Series.hist 中的错误在给定一行 Series 时引发错误 (GH 10214)
HDFStore.select 修改传递的列列表的错误 (GH 7212)
在 Python 3 中 Categorical repr 中 display.width 为 None 的 Bug (GH 10087)
在 to_json 中存在一个错误，当使用某些方向和 CategoricalIndex 时会导致段错误 (GH 10317)
某些 nan 函数没有一致的返回数据类型的问题 (GH 10251)
在检查是否传递了有效轴时 DataFrame.quantile 中的错误 (GH 9543)
groupby.apply 聚合中 Categorical 未保留类别的错误 (GH 10138)
to_csv 中的一个错误，当 datetime 是分数时，date_format 被忽略 (GH 10209)
DataFrame.to_json 中混合数据类型的错误 (GH 10289)
在合并时缓存更新的错误 (GH 10264)
mean() 中的错误，其中整数数据类型可能溢出 (GH 10172)
Panel.from_dict 未在指定时设置 dtype 的错误 (GH 10058)
在传递类似数组的对象时，Index.union 中的错误会引发 AttributeError。 (GH 10149)
Timestamp 的 microsecond、quarter、dayofyear、week 和 daysinmonth 属性返回 np.int 类型，而不是内置的 int。(GH 10050)
访问 daysinmonth, dayofweek 属性时，NaT 中的错误会引发 AttributeError。 (GH 10096)
在使用 max_seq_items=None 设置时，索引 repr 中的错误 (GH 10182)。
在不同平台上使用 dateutil 获取时区数据时出现的错误（ GH 9059, GH 8639, GH 9663, GH 10121）
显示混合频率的日期时间时存在错误；将 ‘ms’ 日期时间显示到适当的精度。(GH 10170)
setitem 中的错误，其中类型提升应用于整个块 (GH 10280)
Series 算术方法中的错误可能会错误地保留名称 (GH 10068)
当按多个键分组时，其中一个键是分类时，GroupBy.get_group 中的错误。(GH 10132)
DatetimeIndex 和 TimedeltaIndex 中的错误在时间增量算术运算后名称丢失 ( GH 9926)
从嵌套 dict 构造 DataFrame 时存在 datetime64 的错误 (GH 10160)
从带有 datetime64 键的 dict 构造 Series 时出现的错误 (GH 9456)
Series.plot(label="LABEL") 中的错误未正确设置标签 (GH 10119)
plot 中的错误未默认使用 matplotlib axes.grid 设置 (GH 9792)
在 read_csv 解析器中，当 engine='python' 时，包含指数但没有小数点的字符串被解析为 int 而不是 float 的错误 (GH 9565)
在指定 fill_value 时，Series.align 中的 Bug 会重置 name (GH 10067)
read_csv 中的错误导致在空 DataFrame 上未设置索引名称 (GH 10184)
SparseSeries.abs 中的错误重置了 name (GH 10241)
TimedeltaIndex 切片中的错误可能会重置 freq (GH 10292)
GroupBy.get_group 中的错误在组键包含 NaT 时引发 ValueError (GH 6992)
SparseSeries 构造函数中的错误忽略输入数据名称 (GH 10258)
在 Categorical.remove_categories 中的错误导致在移除 NaN 类别时，如果底层数据类型是浮点型，则引发 ValueError (GH 10156)
infer_freq 推断时间规则（WOM-5XXX）不被 to_offset 支持的错误 (GH 9425)
在 DataFrame.to_hdf() 中的一个错误，其中表格式会对无效（非字符串）列名引发看似不相关的错误。现在这被明确禁止。(GH 9057)
处理掩码空 DataFrame 的错误 (GH 10126)。
MySQL 接口无法处理数字表/列名称的错误 (GH 10255)
read_csv 中使用返回 datetime64 数组且时间分辨率不是 [ns] 的 date_parser 的错误 (GH 10245)
当结果的 ndim=0 时 Panel.apply 中的 Bug (GH 10332)
read_hdf 中的一个错误，其中 auto_close 无法传递 (GH 9327)。
read_hdf 中的错误，其中无法使用打开的存储 (GH 10330)。
在添加空的 DataFrames 时出现的错误，现在结果是一个 .equals 空 DataFrame 的 DataFrame (GH 10181)。
to_hdf 和 HDFStore 中的错误，未检查 complib 选择是否有效 (GH 4582, GH 8874)。

贡献者#

共有34人为此版本贡献了补丁。名字后面带有“+”的人首次贡献了补丁。

Andrew Rosenfeld
Artemy Kolchinsky
Bernard Willers +
Christer van der Meeren
Christian Hudon +
Constantine Glen Evans +
Daniel Julius Lasiman +
Evan Wright
Francesco Brundu +
Gaëtan de Menten +
Jake VanderPlas
James Hiebert +
Jeff Reback
Joris Van den Bossche
Justin Lecher +
Ka Wo Chen +
Kevin Sheppard
Mortada Mehyar
Morton Fox +
Robin Wilson +
Sinhrks
Stephan Hoyer
Thomas Grainger
Tom Ajamian
Tom Augspurger
Yoshiki Vázquez Baeza
Younggun Kim
austinc +
behzad nouri
jreback
lexual
rekcahpassyla +
scls19fr
sinhrks