1.2.0 中的新功能（2020年12月26日）#

这是 pandas 1.2.0 的更改。请参阅发布以获取包括其他版本 pandas 的完整更新日志。

警告

用于编写旧样式 .xls Excel 文件的 xlwt 包不再维护。xlrd 包现在仅用于读取旧样式的 .xls 文件。

之前，read_excel() 的默认参数 engine=None 在许多情况下会导致使用 xlrd 引擎，包括新的 Excel 2007+ (.xlsx) 文件。如果安装了 openpyxl ，这些情况现在将默认使用 openpyxl 引擎。更多详情请参见 read_excel() 文档。

因此，强烈建议安装 openpyxl 来读取 Excel 2007+ (.xlsx) 文件。请不要在使用 ``xlrd`` 读取 ``.xlsx`` 文件时报告问题。 这不再受支持，请改用 openpyxl。

尝试使用 xlwt 引擎将引发 FutureWarning，除非选项 io.excel.xls.writer 设置为 "xlwt"。虽然此选项现已弃用，并且也会引发 FutureWarning，但它可以全局设置并抑制警告。建议用户使用 openpyxl 引擎来写 .xlsx 文件。

增强功能#

可选地禁止重复标签#

Series 和 DataFrame 现在可以使用 allows_duplicate_labels=False 标志创建，以控制索引或列是否可以包含重复标签 (GH 28394)。这可以用来防止意外引入重复标签，这可能会影响下游操作。

默认情况下，允许重复。

In [1]: pd.Series([1, 2], index=['a', 'a'])
Out[1]:
a    1
a    2
Length: 2, dtype: int64

In [2]: pd.Series([1, 2], index=['a', 'a']).set_flags(allows_duplicate_labels=False)
...
DuplicateLabelError: Index has duplicates.
      positions
label
a        [0, 1]

pandas 将在许多操作中传播 allows_duplicate_labels 属性。

In [3]: a = (
   ...:     pd.Series([1, 2], index=['a', 'b'])
   ...:       .set_flags(allows_duplicate_labels=False)
   ...: )

In [4]: a
Out[4]:
a    1
b    2
Length: 2, dtype: int64

# An operation introducing duplicates
In [5]: a.reindex(['a', 'b', 'a'])
...
DuplicateLabelError: Index has duplicates.
      positions
label
a        [0, 2]

[1 rows x 1 columns]

警告

这是一个实验性功能。目前，许多方法未能传播 allows_duplicate_labels 值。在未来的版本中，预计每个接受或返回一个或多个 DataFrame 或 Series 对象的方法都将传播 allows_duplicate_labels。

更多信息请参见重复标签。

allows_duplicate_labels 标志存储在新的 DataFrame.flags 属性中。这存储了适用于 pandas 对象 的全局属性。这与 DataFrame.attrs 不同，后者存储适用于数据集的信息。

传递参数给 fsspec 后端#

许多读/写函数已经获得了 storage_options 可选参数，用于将参数字典传递给存储后端。这允许，例如，将凭证传递给 S3 和 GCS 存储。有关可以传递给哪些后端的参数的详细信息，可以在各个存储后端的文档中找到（从 fsspec 文档中详细介绍了内置实现，并链接到外部实现）。请参见章节读取/写入远程文件。

GH 35655 添加了对读取 excel 文件的 fsspec 支持（包括 storage_options）。

在 `to_csv` 中支持二进制文件句柄#

to_csv() 支持二进制模式下的文件句柄（GH 19827 和 GH 35058），带有 encoding``（:issue:`13068` 和 :issue:`23854`）和 ``compression``（:issue:`22555`）。如果 pandas 无法自动检测文件句柄是在二进制模式还是文本模式下打开的，则需要提供 ``mode="wb"。

例如：

In [1]: import io

In [2]: data = pd.DataFrame([0, 1, 2])

In [3]: buffer = io.BytesIO()

In [4]: data.to_csv(buffer, encoding="utf-8", compression="gzip")

在 `to_latex` 中支持短标题和表格位置#

DataFrame.to_latex() 现在允许指定一个浮动表格位置 (GH 35281) 和一个短标题 (GH 36267)。

关键词 position 已被添加以设置位置。

In [5]: data = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})

In [6]: table = data.to_latex(position='ht')

In [7]: print(table)
\begin{table}[ht]
\begin{tabular}{lrr}
\toprule
 & a & b \\
\midrule
0 & 1 & 3 \\
1 & 2 & 4 \\
\bottomrule
\end{tabular}
\end{table}

关键词 caption 的使用已经扩展。除了接受单个字符串作为参数外，还可以选择提供一个元组 (full_caption, short_caption) 来添加一个短标题宏。

In [8]: data = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})

In [9]: table = data.to_latex(caption=('the full long caption', 'short caption'))

In [10]: print(table)
\begin{table}
\caption[short caption]{the full long caption}
\begin{tabular}{lrr}
\toprule
 & a & b \\
\midrule
0 & 1 & 3 \\
1 & 2 & 4 \\
\bottomrule
\end{tabular}
\end{table}

`read_csv` 和 `read_table` 的默认浮点精度变化#

对于C解析引擎，方法 read_csv() 和 read_table() 以前默认使用一个解析器，该解析器在读取浮点数时对最后一位精度的处理略有不当。选项 floating_precision="high" 一直可用以避免此问题。从本版本开始，默认情况下现在使用更准确的解析器，使 floating_precision=None 对应于高精度解析器，并新增选项 floating_precision="legacy" 以使用旧版解析器。默认使用更高精度解析器的更改不应影响性能。(GH 17154)

实验性的可空浮点数据类型#

我们已经添加了 Float32Dtype / Float64Dtype 和 FloatingArray。这些是专门用于浮点数据的扩展数据类型，可以持有 pd.NA 缺失值指示符 (GH 32265, GH 34307)。

虽然默认的浮点数据类型已经使用 np.nan 支持缺失值，但这些新的数据类型使用 ``pd.NA``（及其相应的行为）作为缺失值指示符，与已经存在的可空整数和布尔数据类型保持一致。

一个 np.nan 和 pd.NA 行为不同的例子是比较操作：

# the default NumPy float64 dtype
In [11]: s1 = pd.Series([1.5, None])

In [12]: s1
Out[12]: 
0    1.5
1    NaN
dtype: float64

In [13]: s1 > 1
Out[13]: 
0     True
1    False
dtype: bool

# the new nullable float64 dtype
In [14]: s2 = pd.Series([1.5, None], dtype="Float64")

In [15]: s2
Out[15]: 
0     1.5
1    <NA>
dtype: Float64

In [16]: s2 > 1
Out[16]: 
0    True
1    <NA>
dtype: boolean

有关使用 pd.NA 缺失值指示器时的行为的更多详细信息，请参阅 NA 语义文档部分。

如上所示，可以使用 “Float64” 或 “Float32” 字符串（首字母大写以区别于默认的 “float64” 数据类型）来指定 dtype。或者，您也可以使用 dtype 对象：

In [17]: pd.Series([1.5, None], dtype=pd.Float32Dtype())
Out[17]: 
0     1.5
1    <NA>
dtype: Float32

对现有的整数或布尔可空数据类型进行操作并产生浮点数结果的操作现在也将使用可空的浮点数据类型 (GH 38178)。

警告

实验性：新的浮点数据类型目前是实验性的，它们的行为或API可能会在没有警告的情况下发生变化。特别是关于NaN（与NA缺失值不同）的行为可能会发生变化。

聚合时保留索引/列名#

当使用 concat() 或 DataFrame 构造函数进行聚合时，pandas 现在将尽可能地保留索引和列名 (GH 35847)。在所有输入共享一个公共名称的情况下，该名称将被分配给结果。当输入名称不完全一致时，结果将没有名称。以下是一个保留索引名称的示例：

In [18]: idx = pd.Index(range(5), name='abc')

In [19]: ser = pd.Series(range(5, 10), index=idx)

In [20]: pd.concat({'x': ser[1:], 'y': ser[:-1]}, axis=1)
Out[20]: 
       x    y
abc          
1    6.0  6.0
2    7.0  7.0
3    8.0  8.0
4    9.0  NaN
0    NaN  5.0

对于 MultiIndex 也是如此，但逻辑是按层级分别应用的。

GroupBy 直接支持 EWM 操作#

DataFrameGroupBy 现在直接支持指数加权窗口操作（GH 16037）。

In [21]: df = pd.DataFrame({'A': ['a', 'b', 'a', 'b'], 'B': range(4)})

In [22]: df
Out[22]: 
   A  B
0  a  0
1  b  1
2  a  2
3  b  3

In [23]: df.groupby('A').ewm(com=1.0).mean()
Out[23]: 
            B
A            
a 0  0.000000
  2  1.333333
b 1  1.000000
  3  2.333333

此外，mean 通过 Numba 支持执行，使用 engine 和 engine_kwargs 参数。必须安装 Numba 作为可选依赖项才能使用此功能。

其他增强功能#

为 Timestamp 、 DatetimeIndex 、 Period 、 PeriodIndex 添加了 day_of_week （兼容别名 dayofweek）属性 (GH 9605)
为 Timestamp 、 DatetimeIndex 、 Period 、 PeriodIndex 添加了 day_of_year （兼容别名 dayofyear）属性 (GH 9605)
添加了 set_flags() 用于在 Series 或 DataFrame 上设置表范围的标志 (GH 28394)
DataFrame.applymap() 现在支持 na_action (GH 23803)
索引 支持对象数据类型的除法和乘法 (GH 34160)
io.sql.get_schema() 现在支持一个 schema 关键字参数，它将在创建表语句中添加一个模式 (GH 28486)
DataFrame.explode() 和 Series.explode() 现在支持集合的展开 (GH 35614)
DataFrame.hist() 现在支持时间序列（日期时间）数据 (GH 32590)
Styler.set_table_styles() 现在允许直接对行和列进行样式设置，并且可以链式调用 (GH 35607)
Styler 现在允许直接将 CSS 类名添加到单个数据单元格中 (GH 36159)
Rolling.mean() 和 Rolling.sum() 使用 Kahan 求和来计算平均值以避免数值问题 (GH 10319, GH 11645, GH 13254, GH 32761, GH 36031)
DatetimeIndex.searchsorted(), TimedeltaIndex.searchsorted(), PeriodIndex.searchsorted(), 和 Series.searchsorted() 使用类似日期时间的 dtypes 现在会尝试将字符串参数（类似列表和标量）转换为匹配的类似日期时间的类型 (GH 36346)
添加了方法 IntegerArray.prod()、IntegerArray.min() 和 IntegerArray.max() (GH 33790)
在 DataFrame 上使用 NumPy ufunc 调用扩展类型时，现在尽可能保留扩展类型 (GH 23743)
在多个 DataFrame 对象上调用二进制输入的 NumPy ufunc 现在会对其进行对齐，这与二进制操作和 Series 上的 ufunc 行为相匹配 (GH 23743)。这一更改在 pandas 1.2.1 中已被恢复，并且不对齐 DataFrames 的行为已被弃用，请参阅 1.2.1 版本说明。
如果可能，RangeIndex.difference() 和 RangeIndex.symmetric_difference() 将返回 RangeIndex 而不是 Int64Index (GH 36564)
DataFrame.to_parquet() 现在支持 parquet 格式中的 MultiIndex 列 (GH 34777)
read_parquet() 增加了一个 use_nullable_dtypes=True 选项，以在可能的情况下使用可空数据类型，这些数据类型在结果 DataFrame 中使用 pd.NA 作为缺失值指示符（默认值为 False，仅适用于 engine="pyarrow"）（GH 31242）
添加了 Rolling.sem() 和 Expanding.sem() 以计算均值的标准误差 (GH 26476)
Rolling.var() 和 Rolling.std() 使用 Kahan 求和和 Welford 方法来避免数值问题 (GH 37051)
DataFrame.corr() 和 DataFrame.cov() 使用 Welford 的方法来避免数值问题 (GH 37448)
DataFrame.plot() 现在识别 scatter 和 hexbin 类型的图形的 xlabel 和 ylabel 参数 (GH 37001)
DataFrame 现在支持 divmod 操作 (GH 37165)
DataFrame.to_parquet() 现在在没有传递 path 参数时返回一个 bytes 对象 (GH 37105)
Rolling 现在支持固定窗口的 closed 参数 (GH 34315)
DatetimeIndex 和带有 datetime64 或 datetime64tz 数据类型的 Series 现在支持 std (GH 37436)
Window 现在支持 win_type 中的所有 Scipy 窗口类型，并支持灵活的关键字参数 (GH 34556)
testing.assert_index_equal() 现在有一个 check_order 参数，允许以无序的方式检查索引 (GH 37478)
read_csv() 支持压缩文件的内存映射 (GH 37621)
为 DataFrame.groupby() 和 DataFrame.resample() 的 min_count 关键字添加对 min、max、first 和 last 函数的支持 (GH 37821, GH 37768)
当给出无效的合并列定义时，改进 DataFrame.merge() 的错误报告 (GH 16228)
通过实现Kahan求和改进 Rolling.skew()、Rolling.kurt()、Expanding.skew() 和 Expanding.kurt() 的数值稳定性（GH 6929）
改进了使用 axis=1 对 DataFrameGroupBy 的列进行子集化时的错误报告 (GH 37725)
为 DataFrame.merge() 和 DataFrame.join() 实现 cross 方法 (GH 5401)
当使用 chunksize/iterator 调用 read_csv()、read_sas() 和 read_json() 时，它们可以在 with 语句中使用，因为它们返回上下文管理器 (GH 38225)
增加了可用于样式化Excel导出的命名颜色列表，启用了所有CSS4颜色 (GH 38247)

值得注意的错误修复#

这些是可能会有显著行为变化的错误修复。

DataFrame 归约的一致性#

DataFrame.any() 和 DataFrame.all() 在 bool_only=True 的情况下，现在基于每一列来决定是否排除 object-dtype 列，而不是检查所有 object-dtype 列是否可以被视为布尔值。

这防止了在部分列上应用归约可能导致更大的 Series 结果的病态行为。见 (GH 37799)。

In [24]: df = pd.DataFrame({"A": ["foo", "bar"], "B": [True, False]}, dtype=object)

In [25]: df["C"] = pd.Series([True, True])

以前的行为:

In [5]: df.all(bool_only=True)
Out[5]:
C    True
dtype: bool

In [6]: df[["B", "C"]].all(bool_only=True)
Out[6]:
B    False
C    True
dtype: bool

新行为:

In [26]: In [5]: df.all(bool_only=True)
Out[26]: 
C    True
dtype: bool

In [27]: In [6]: df[["B", "C"]].all(bool_only=True)
Out[27]: 
C    True
dtype: bool

其他带有 numeric_only=None 的 DataFrame 缩减操作也将避免这种病态行为 (GH 37827):

In [28]: df = pd.DataFrame({"A": [0, 1, 2], "B": ["a", "b", "c"]}, dtype=object)

以前的行为:

In [3]: df.mean()
Out[3]: Series([], dtype: float64)

In [4]: df[["A"]].mean()
Out[4]:
A    1.0
dtype: float64

新行为:

In [3]: df.mean()
Out[3]:
A    1.0
dtype: float64

In [4]: df[["A"]].mean()
Out[4]:
A    1.0
dtype: float64

此外，使用 numeric_only=None 的 DataFrame 归约现在将与它们的 Series 对应部分保持一致。特别是，对于 Series 方法引发 TypeError 的归约，DataFrame 归约现在将认为该列是非数字的，而不是转换为可能具有不同语义的 NumPy 数组（GH 36076, GH 28949, GH 21020）。

In [29]: ser = pd.Series([0, 1], dtype="category", name="A")

In [30]: df = ser.to_frame()

以前的行为:

In [5]: df.any()
Out[5]:
A    True
dtype: bool

新行为:

In [5]: df.any()
Out[5]: Series([], dtype: bool)

增加 Python 的最小版本#

pandas 1.2.0 支持 Python 3.7.1 及以上版本 (GH 35214)。

增加了依赖项的最小版本#

一些依赖项的最低支持版本已更新（GH 35214）。如果已安装，我们现在要求：

包	最低版本	必需的	更改
numpy	1.16.5	X	X
pytz	2017.3	X	X
python-dateutil	2.7.3	X
瓶颈	1.2.1
numexpr	2.6.8		X
pytest (开发版)	5.0.1		X
mypy (dev)	0.782		X

对于可选库，一般建议使用最新版本。下表列出了在 pandas 开发过程中当前测试的每个库的最低版本。低于最低测试版本的可选库可能仍然有效，但不被视为受支持。

包	最低版本	更改
beautifulsoup4	4.6.0
fastparquet	0.3.2
fsspec	0.7.4
gcsfs	0.6.0
lxml	4.3.0	X
matplotlib	2.2.3	X
numba	0.46.0
openpyxl	2.6.0	X
pyarrow	0.15.0	X
pymysql	0.7.11	X
pytables	3.5.1	X
s3fs	0.4.0
scipy	1.2.0
sqlalchemy	1.2.8	X
xarray	0.12.3	X
xlrd	1.2.0	X
xlsxwriter	1.0.2	X
xlwt	1.3.0	X
pandas-gbq	0.12.0

更多信息请参见依赖项和可选依赖项。

其他 API 更改#

现在，对于 Datetime-like Index 子类的 Series.sort_values() 和 Index.sort_values()，降序排序是稳定的。这会影响在多列上排序 DataFrame、使用产生重复项的键函数排序或在使用 Index.sort_values() 时请求排序索引时的排序顺序。使用 Series.value_counts() 时，缺失值的计数不再必然是重复计数列表中的最后一个。相反，它的位置对应于原始 Series 中的位置。使用 Datetime-like Index 子类的 Index.sort_values() 时，NaTs 忽略了 na_position 参数，并被排序到最前面。现在它们尊重 na_position，默认值为 last，与其他 Index 子类相同 (GH 35992)
向 Categorical.take()、DatetimeArray.take()、TimedeltaArray.take() 或 PeriodArray.take() 传递一个无效的 fill_value 现在会引发 TypeError 而不是 ValueError (GH 37733)
将无效的 fill_value 传递给带有 CategoricalDtype 的 Series.shift() 现在会引发 TypeError 而不是 ValueError (GH 37733)
将无效值传递给 IntervalIndex.insert() 或 CategoricalIndex.insert() 现在会引发 TypeError 而不是 ValueError (GH 37733)
尝试使用无效的 fill_value 重新索引带有 CategoricalIndex 的 Series 现在会引发 TypeError 而不是 ValueError (GH 37733)
CategoricalIndex.append() 使用包含非类别值的索引现在会进行类型转换，而不是引发 TypeError (GH 38098)

弃用#

已弃用的参数 inplace 在 MultiIndex.set_codes() 和 MultiIndex.set_levels() 中 (GH 35626)
已弃用的参数 dtype 对于所有 Index 子类的方法 copy()。改为使用 astype() 方法来更改 dtype (GH 35853)
已弃用的参数 levels 和 codes 在 MultiIndex.copy() 中。请使用 set_levels() 和 set_codes() 方法代替 (GH 36685)
日期解析函数 parse_date_time(), parse_date_fields(), parse_all_fields() 和 generic_parser() 从 pandas.io.date_converters 已被弃用，并将在未来版本中移除；请改用 to_datetime() (GH 35741)
DataFrame.lookup() 已被弃用，并将在未来版本中移除，请改用 DataFrame.melt() 和 DataFrame.loc() (GH 35224)
方法 Index.to_native_types() 已弃用。请改用 .astype(str) (GH 28867)
不推荐使用单个类似日期时间的字符串作为 df[string] 来索引 DataFrame 行（由于其模糊性，不确定是索引行还是选择列），请改用 df.loc[string] (GH 36179)
已弃用 Index.is_all_dates() (GH 27744)
在未来的版本中，Series.str.replace() 的 regex 默认值将从 True 更改为 False。此外，当 regex=True 时，单字符正则表达式将不被视为字面字符串 (GH 24804)
不推荐在 DataFrame 和 Series 之间的比较操作中自动对齐，请在例如 frame == ser 之前执行 frame, ser = frame.align(ser, axis=1, copy=False) (GH 28759)
Rolling.count() 在 min_periods=None 的情况下，在未来版本中将默认为窗口的大小 (GH 31302)
在DataFrame上使用“外部”ufuncs返回4d ndarray现在已被弃用。请先转换为ndarray (GH 23743)
在使用天真的 datetime 对象时，不推荐在具有时区意识的 DatetimeIndex 上进行切片索引，以匹配标量索引行为 (GH 36148)
Index.ravel() 返回 np.ndarray 已被弃用，未来这将返回相同索引的视图 (GH 19956)
在 to_timedelta() 中弃用表示单位为 ‘M’, ‘Y’ 或 ‘y’ 的字符串 (GH 36666)
Index 方法 &, |, 和 ^ 分别表现为集合操作 Index.intersection(), Index.union(), 和 Index.symmetric_difference()，这些方法已被弃用，未来将表现为与 Series 行为匹配的逐点布尔操作。请改用命名的集合方法 (GH 36758)
Categorical.is_dtype_equal() 和 CategoricalIndex.is_dtype_equal() 已被弃用，将在未来版本中移除 (GH 37545)
Series.slice_shift() 和 DataFrame.slice_shift() 已被弃用，请使用 Series.shift() 或 DataFrame.shift() 代替 (GH 37601)
在未排序的 DatetimeIndex 对象上使用索引中不存在的键进行部分切片已被弃用，并将在未来版本中移除 (GH 18531)
在 PeriodIndex.astype() 中的 how 关键字已被弃用，并将在未来版本中移除，请改用 index.to_timestamp(how=how) (GH 37982)
弃用 Index 子类（除了 DatetimeIndex、TimedeltaIndex 和 PeriodIndex）的 Index.asi8() (GH 37877)
inplace 参数在 Categorical.remove_unused_categories() 中已被弃用，并将在未来版本中移除 (GH 37643)
null_counts 参数在 DataFrame.info() 中已被弃用，并被 show_counts 取代。它将在未来的版本中被移除（GH 37999）

在非对齐的DataFrames上调用NumPy ufuncs

在 pandas 1.2.0 中，对未对齐的 DataFrames 调用 NumPy ufuncs 的行为发生了变化（在调用 ufunc 之前对齐输入），但这一更改在 pandas 1.2.1 中被恢复。现在不对其的行为已被弃用，更多详情请参见 1.2.1 版本说明。

性能提升#

在使用 dtype str 或 StringDtype 从包含许多字符串元素的数组创建 DataFrame 或 Series 时的性能改进 (GH 36304, GH 36317, GH 36325, GH 36432, GH 37371)
在使用 numba 引擎时，DataFrameGroupBy.agg() 和 SeriesGroupBy.agg() 的性能提升 (GH 35759)
在从巨大字典创建 Series.map() 时的性能改进 (GH 34717)
在使用 numba 引擎时，DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 的性能改进 (GH 36240)
Styler uuid 方法被修改以在保持合理低表格碰撞概率的同时压缩网络数据传输（GH 36345）
在使用 float dtype 列的非纳秒时间单位时，to_datetime() 的性能改进 (GH 20445)
在 IntervalArray 上设置值的性能改进 (GH 36310)
内部索引方法 _shallow_copy() 现在使新索引和原始索引共享缓存属性，避免在任一索引上创建这些属性时再次创建。这可以加快依赖于创建现有索引副本的操作 (GH 36840)
在 RollingGroupby.count() 中的性能提升 (GH 35625)
对于固定窗口，Rolling.min() 和 Rolling.max() 的性能下降（GH 36567）
在 DataFrame.to_pickle() 中使用 protocol=5 时，减少了在 python 3.8+ 中的峰值内存使用 (GH 34244)
当对象有许多索引标签时，dir 调用更快，例如 dir(ser) (GH 37450)
在 ExpandingGroupby 中的性能提升 (GH 37064)
在 Categorical 中，Series.astype() 和 DataFrame.astype() 的性能改进 (GH 8628)
在 DataFrame.groupby() 中对 float dtype 的性能改进 (GH 28303)，底层哈希函数的变化可能导致基于浮点数的索引排序顺序的变化（例如 Index.value_counts()）
在 pd.isin() 中对超过 1e6 元素的输入进行了性能改进 (GH 36611)
使用类似列表的索引器对 DataFrame.__setitem__() 的性能改进 (GH 37954)
read_json() 现在在指定 chunksize 时避免将整个文件读入内存（GH 34548）

错误修复#

Categorical#

Categorical.fillna() 将始终返回一个副本，验证传递的填充值，无论是否存在任何需要填充的NAs，并且不允许 NaT 作为数字类别的填充值 (GH 36530)
在 Categorical.__setitem__() 中的错误，在尝试设置一个元组值时错误地引发 (GH 20439)
CategoricalIndex.equals() 中的错误不正确地将非类别条目转换为 np.nan (GH 37667)
CategoricalIndex.where() 中的错误将非类别条目错误地设置为 np.nan 而不是引发 TypeError (GH 37977)
在具有 tz-aware datetime64 类别的 Categorical.to_numpy() 和 np.array(categorical) 中存在一个错误，错误地丢失了时区信息，而不是转换为对象数据类型 (GH 38136)

Datetime-like#

在 DataFrame.combine_first() 中的一个错误，当原始 DataFrame 中不存在该列时，会将其他 DataFrame 中的类似日期时间的列转换为整数 (GH 28481)
在 DatetimeArray.date 中的错误，当使用只读后备数组时会引发 ValueError (GH 33530)
在 NaT 比较中未能引发 TypeError 的错误，在无效的不等式比较中 (GH 35046)
在 DateOffset 中的一个错误，当输入值超出正常范围（例如 months=12）时，从 pickle 文件重建的属性与原始对象不同 (GH 34511)
DatetimeIndex.get_slice_bound() 中的一个错误，其中 datetime.date 对象不被接受，或者使用带有 tz-aware DatetimeIndex 的朴素 Timestamp (GH 35690)
在 DatetimeIndex.slice_locs() 中的错误，其中 datetime.date 对象不被接受 (GH 34077)
在 DatetimeIndex.searchsorted(), TimedeltaIndex.searchsorted(), PeriodIndex.searchsorted(), 和 Series.searchsorted() 中存在一个错误，当使用 datetime64, timedelta64 或 Period 数据类型时，NaT 值的放置与 NumPy 不一致 (GH 36176, GH 36254)
在 DatetimeArray、TimedeltaArray 和 PeriodArray 方法 __setitem__ 中，将字符串数组转换为类似日期时间的标量，但不转换标量字符串 (GH 36261)
DatetimeArray.take() 中的错误不正确地允许 fill_value 使用不匹配的时区 (GH 37356)
在 DatetimeIndex.shift 中的错误在移动空索引时错误地引发 (GH 14811)
Timestamp 和 DatetimeIndex 在 tz-aware 和 tz-naive 对象之间的比较现在遵循标准库 datetime 的行为，对于 !=/== 返回 True/False，并在不等比较时引发 (GH 28507)
DatetimeIndex.equals() 和 TimedeltaIndex.equals() 中的错误错误地将 int64 索引视为相等 (GH 36744)
Series.to_json(), DataFrame.to_json(), 和 read_json() 现在在 orient 结构为 table 时实现时区解析 (GH 35973)
astype() 现在尝试直接从 object 转换为 datetime64[ns, tz]，通过从字符串推断出的时区 (GH 35973)
在空索引或系列上使用 timedelta64 数据类型的 TimedeltaIndex.sum() 和 Series.sum() 中的错误，返回 NaT 而不是 Timedelta(0) (GH 31751)
DatetimeArray.shift() 中的错误不正确地允许 fill_value 使用不匹配的时区 (GH 37299)
在将带有非零 offset 的 BusinessDay 添加到非标量对象时出现的错误 (GH 37457)
在只读数组中使用 to_datetime() 时错误地引发的问题 (GH 34857)
在 datetime64[ns] 类型和 DatetimeIndex.isin() 的 Series.isin() 中存在一个错误，错误地将整数转换为日期时间 (GH 36621)
在 Series.isin() 中使用 datetime64[ns] dtype 和 DatetimeIndex.isin() 未能始终将带时区和无时区的日期时间视为不同 (GH 35728)
在 PeriodDtype dtype 和 PeriodIndex.isin() 中存在一个错误，未能考虑具有不同 PeriodDtype 的参数，总是视为不同 (GH 37528)
Period 构造函数现在正确处理 value 参数中的纳秒 (GH 34621 和 GH 17053)

Timedelta#

TimedeltaIndex、Series 和 DataFrame 在 timedelta64 数据类型和分母中包含 NaT 时的向下取整除法错误 (GH 35529)
在 Timedelta 和 to_datetime() 中解析 ISO 8601 持续时间时存在错误 (GH 29773, GH 36204)
在只读数组中使用 to_timedelta() 时错误地引发的问题 (GH 34857)
Timedelta 中的一个错误，当字符串输入的精度高于纳秒时，错误地截断到子秒部分 (GH 36738)

时区#

date_range() 中的错误在 ambiguous=False 的有效输入下引发了 AmbiguousTimeError (GH 35297)
Timestamp.replace() 中的错误会丢失折叠信息 (GH 37610)

Numeric#

在 to_numeric() 中存在浮点精度不正确的错误 (GH 31364)
在 axis=1 和 bool_only=True 的情况下，DataFrame.any() 中的错误忽略了 bool_only 关键字 (GH 32432)
在 Series.equals() 中的一个错误，当比较 NumPy 数组和标量时会引发 ValueError (GH 35267)
在 Series 中的一个错误，其中两个 Series 各自有一个具有不同时区的 DatetimeIndex，在进行算术运算时这些索引被错误地更改（GH 33671）
在使用 check_exact=False 处理复杂数值类型时，pandas.testing 模块函数中的错误 (GH 28235)
在 DataFrame.__rmatmul__() 错误处理中报告转置形状的错误 (GH 21581)
Series 的 flex 算术方法中的一个错误，当与 list、tuple 或 np.ndarray 操作时，结果会有一个不正确的名称 (GH 36760)
IntegerArray 与 timedelta 和 np.timedelta64 对象相乘时存在错误 (GH 36870)
MultiIndex 与元组比较时错误地将元组视为类数组 (GH 21517)
在包含 NaT 值的 datetime64 dtypes 中，DataFrame.diff() 中的错误未能正确填充 NaT 结果 (GH 32441)
DataFrame 算术运算中接受关键字参数的错误 (GH 36843)
IntervalArray 与 Series 比较时未返回 Series 的 Bug (GH 36908)
DataFrame 中的一个错误，允许与数组类列表进行算术运算，结果未定义。行为已更改为引发 ValueError (GH 36702)
在 timedelta64 类型和 skipna=False 的情况下，DataFrame.std() 中的错误 (GH 37392)
在 datetime64 类型和 skipna=False 的情况下，DataFrame.min() 和 DataFrame.max() 中的错误 (GH 36907)
在混合数据类型的情况下，DataFrame.idxmax() 和 DataFrame.idxmin() 中的错误不正确地引发 TypeError (GH 38195)

转换#

在 orient='records' 的情况下，DataFrame.to_dict() 中的错误现在会为类似日期时间的列返回 Python 原生的日期时间对象 (GH 21256)
在存在 pd.NA 值的情况下，从 string 转换为 float 时 Series.astype() 中的错误 (GH 37626)

字符串#

在 Series.to_string()、DataFrame.to_string() 和 DataFrame.to_latex() 中，当 index=False 时添加了一个前导空格 (GH 24980)
在尝试转换仅包含数字字符串和 NA 的字符串 dtype 系列时，to_numeric() 中存在一个引发 TypeError 的错误 (GH 37262)

Interval#

在 DataFrame.replace() 和 Series.replace() 中的错误，其中 Interval 数据类型会被转换为对象数据类型 (GH 34871)
在 IntervalIndex.take() 中使用负索引和 fill_value=None 的错误 (GH 37330)
在 IntervalIndex.putmask() 中存在一个错误，对于类似日期时间的 dtype 不正确地转换为 object dtype (GH 37968)
IntervalArray.astype() 中的错误在处理 CategoricalDtype 对象时错误地丢失了 dtype 信息 (GH 37984)

索引#

PeriodIndex.get_loc() 中的错误在非日期字符串上错误地引发 ValueError 而不是 KeyError，导致在 Series.__getitem__()、Series.__contains__() 和 Series.loc.__getitem__() 中出现类似的错误 (GH 34240)
在 Index.sort_values() 中的一个错误，当传递空值时，方法会通过尝试比较缺失值而不是将它们推到排序顺序的末尾而中断 (GH 35584)
在 Index.get_indexer() 和 Index.get_indexer_non_unique() 中的错误，返回的是 int64 数组而不是 intp (GH 36359)
在 DataFrame.sort_index() 中的一个错误，当参数 ascending 作为列表传递给单层索引时给出错误结果 (GH 32334)
DataFrame.reset_index() 中的错误对于在具有 Categorical 数据类型的级别中缺少值的 MultiIndex 输入不正确地引发了一个 ValueError (GH 24206)
使用布尔掩码对类似日期时间的值进行索引时存在错误，有时返回视图而不是副本 (GH 36210)
在 IntervalIndex 列和数字索引器的情况下，DataFrame.__getitem__() 和 DataFrame.loc.__getitem__() 中的错误 (GH 26490)
在非唯一 MultiIndex 和空列表索引器的情况下，Series.loc.__getitem__() 中的错误 (GH 13691)
在具有 MultiIndex 和名为 "0" 的级别的 Series 或 DataFrame 上的索引错误 (GH 37194)
在使用无符号整数数组作为索引器时，Series.__getitem__() 中的错误会导致不正确的结果或段错误，而不是引发 KeyError (GH 37218)
在 Index.where() 中的错误不正确地将数值转换为字符串 (GH 37591)
当索引器是一个带有负步长的切片时，DataFrame.loc() 返回空结果的错误 (GH 38071)
在 Series.loc() 和 DataFrame.loc() 中的错误在索引是 object 类型且给定的数值标签在索引中时引发 (GH 26491)
在 DataFrame.loc() 中的错误，当 loc 应用于 MultiIndex 的单个级别时，返回了请求的键加上缺失值 (GH 27104)
在使用包含NA值的类列表索引器对具有 CategoricalIndex 的 Series 或 DataFrame 进行索引时出现的Bug (GH 37722)
在 DataFrame.loc.__setitem__() 中存在一个错误，当扩展一个混合数据类型的空 DataFrame 时 (GH 37932)
在 DataFrame.xs() 中的错误忽略了 droplevel=False 对于列的影响 (GH 19056)
Bug in DataFrame.reindex() raising IndexingError wrongly for empty DataFrame with tolerance not None or method="nearest" (GH 27315)
在使用包含索引的``categories``中元素但不在索引本身的类列表索引器对 Series 或 DataFrame 进行索引时出现的错误，未能引发 KeyError (GH 37901)
在将布尔标签插入带有数值 索引 列的 DataFrame 时，错误地转换为整数 (GH 36319)
在 DataFrame.iloc() 和 Series.iloc() 中对齐 __setitem__ 中的对象的错误 (GH 22046)
MultiIndex.drop() 中的错误在标签部分找到时不会引发 (GH 37820)
DataFrame.loc() 中的错误在给定剩余级别的 slice(None) 时，缺少组合没有引发 KeyError (GH 19556)
在 DataFrame.loc() 中存在一个错误，当给定非整数切片从 MultiIndex 中选择值时会引发 TypeError (GH 25165, GH 24263)
在索引是具有一个级别的 MultiIndex 时，Series.at() 返回包含一个元素的 Series 而不是标量 (GH 38053)
在索引器与要过滤的 MultiIndex 顺序不同时，DataFrame.loc() 返回和赋值元素的顺序错误 (GH 31330, GH 34603)
在 DataFrame.loc() 和 DataFrame.__getitem__() 中存在一个错误，当列是只有一个级别 MultiIndex 时会引发 KeyError (GH 29749)
在 Series.__getitem__() 和 DataFrame.__getitem__() 中存在一个错误，对于 IntervalIndex 在没有缺失键的情况下引发空白的 KeyError (GH 27365)
在 DataFrame 或 Series 上设置新标签时，当新标签不在索引的类别中时，CategoricalIndex 错误地引发 TypeError 的错误 (GH 38098)
在插入类似列表的 np.array、list 或 tuple 到长度相等的 object Series 时，Series.loc() 和 Series.iloc() 引发 ValueError 的错误 (GH 37748, GH 37486)
在 Series.loc() 和 Series.iloc() 中存在一个错误，将 object Series 的所有值设置为类列表 ExtensionArray 的值，而不是插入它 (GH 38271)

缺失#

SeriesGroupBy.transform() 中的错误现在可以正确处理 dropna=False 的缺失值 (GH 35014)
当 dropna=True 时，Series.nunique() 中的错误在同时存在 NA 和 None 缺失值时返回了不正确的结果 (GH 37566)
在 Series.interpolate() 中的一个错误，当使用方法 pad 和 backfill 时，关键字参数 limit_area 和 limit_direction 没有效果 (GH 31048)

MultiIndex#

当与 IndexSlice 一起使用时，DataFrame.xs() 中的错误会引发 TypeError 并带有消息 "Expected label or tuple of labels" (GH 35301)
在索引中包含 NaT 值的 DataFrame.reset_index() 存在错误，会引发 ValueError 并带有消息 "无法将浮点 NaN 转换为整数" (GH 36541)
在使用包含字符串和 NaN 值的 MultiIndex 时，DataFrame.combine_first() 中的错误会引发 TypeError (GH 36562)
在 MultiIndex.drop() 中的错误在输入不存在的键时会删除 NaN 值 (GH 18853)
当索引有重复且未排序时，MultiIndex.drop() 删除的值比预期的多 (GH 33494)

I/O#

read_sas() 在失败时不再泄漏资源 (GH 35566)
DataFrame.to_csv() 和 Series.to_csv() 中的一个错误在调用时与包含 b 的 mode 和文件名结合使用时引发了 ValueError (GH 35058)
在 read_csv() 中使用 float_precision='round_trip' 时，未处理 decimal 和 thousands 参数 (GH 35365)
to_pickle() 和 read_pickle() 关闭了用户提供的文件对象 (GH 35679)
to_csv() 传递压缩参数给 'gzip' 时总是传递给 gzip.GzipFile (GH 28103)
to_csv() 不支持没有文件名的二进制文件对象的zip压缩（GH 35058）
to_csv() 和 read_csv() 没有为内部转换为类文件对象的路径类对象处理 compression 和 encoding (GH 35677, GH 26124, GH 32392)
DataFrame.to_pickle(), Series.to_pickle(), 和 read_pickle() 不支持文件对象的压缩 (GH 26237, GH 29054, GH 29570)
LongTableBuilder.middle_separator() 中的错误在 LaTeX 文档的表格列表中重复了 LaTeX longtable 条目 (GH 34360)
使用 engine='python' 时 read_csv() 中的错误，如果第一行有多个项目且第一个元素以BOM开头，数据会被截断 (GH 36343)
从 read_gbq() 中移除了 private_key 和 verbose ，因为它们在 pandas-gbq 中不再支持 (GH 34654, GH 30200)
将最小 pytables 版本提升至 3.5.1 以避免在 read_hdf() 中出现 ValueError (GH 24839)
当 delim_whitespace=True 和 sep=default 时，read_table() 和 read_csv() 中的错误 (GH 36583)
在使用 lines=True 和 orient='records' 时，DataFrame.to_json() 和 Series.to_json() 中的最后一个记录行没有附加 ‘换行符’ (GH 36888)
在固定偏移时区中 read_parquet() 的错误。时区的字符串表示未被识别 (GH 35997, GH 36004)
在指定了 float_format 时，DataFrame.to_html(), DataFrame.to_string(), 和 DataFrame.to_latex() 中的错误忽略了 na_rep 参数 (GH 9046, GH 13828)
复数输出渲染中的错误显示了太多尾随零 (GH 36799)
HDFStore 中的一个错误在导出包含 datetime64[ns, tz] 数据类型的空 DataFrame 时抛出了一个 TypeError，使用固定 HDF5 存储 (GH 20594)
HDFStore 中的错误在导出带有 datetime64[ns, tz] 数据类型的 Series 时会丢失时区信息，使用固定 HDF5 存储 (GH 20594)
read_csv() 在请求 engine="c" 和 encoding 时会关闭用户提供的二进制文件句柄 (GH 36980)
DataFrame.to_hdf() 中的错误在 dropna=True 时没有删除缺失的行 (GH 35719)
read_html() 中的一个错误在向 io 参数提供 pathlib.Path 参数时引发了 TypeError (GH 37705)
DataFrame.to_excel(), Series.to_excel(), DataFrame.to_markdown(), 和 Series.to_markdown() 现在支持写入 fsspec URL，例如 S3 和 Google Cloud Storage (GH 33987)
read_fwf() 中 skip_blank_lines=True 的错误没有跳过空白行 (GH 37758)
使用 read_json() 并设置 dtype=False 来解析缺失值为 NaN 而不是 None (GH 28501)
read_fwf() 在 compression=None 时推断压缩，这与其他的 read_* 函数不一致 (GH 37909)
DataFrame.to_html() 忽略了 ExtensionDtype 列的 formatters 参数 (GH 36525)
将最小 xarray 版本提升至 0.12.3 以避免引用已移除的 Panel 类 (GH 27101, GH 37983)
DataFrame.to_csv() 重新打开也实现了 os.PathLike 的类文件句柄 (GH 38125)
在将带有缺失值的切片 pyarrow.Table 转换为 DataFrame 时出现的错误 (GH 38525)
在列名包含百分号时，read_sql_table() 引发 sqlalchemy.exc.OperationalError 的错误 (GH 37517)

周期#

在 DataFrame.replace() 和 Series.replace() 中的错误，其中 Period dtypes 会被转换为 object dtypes (GH 34871)

绘图#

DataFrame.plot() 中的一个错误在 subplots=True 时会旋转 xticklabels，即使 x 轴不是不规则的时间序列 (GH 29460)
在 DataFrame.plot() 中的一个错误，其中 style 关键字中的标记字母有时会导致 ValueError (GH 21003)
在 DataFrame.plot.bar() 和 Series.plot.bar() 中的一个错误，其中刻度位置是按值顺序分配的，而不是使用实际的数值或对字符串使用智能排序（GH 26186, GH 11465）。此修复已在 pandas 1.2.1 中回滚，请参见 1.2.1 版本的新增内容（2021年1月20日）
双轴正在失去它们的刻度标签，这应该只发生在’外部’共享轴的所有行或列中，除了最后一行或列 (GH 33819)
Series.plot() 和 DataFrame.plot() 中的错误在 Series 或 DataFrame 由具有固定频率的 TimedeltaIndex 索引且 x 轴下限大于上限时抛出 ValueError (GH 37454)
当 subplots=False 时，DataFrameGroupBy.boxplot() 中的错误会引发 KeyError (GH 16748)
DataFrame.plot() 和 Series.plot() 中的错误在未传递 sharey 参数时覆盖了 matplotlib 的共享 y 轴行为 (GH 37942)
在 DataFrame.plot() 中的错误在使用 ExtensionDtype 列时引发了一个 TypeError (GH 32073)

Styler#

在 Styler.render() 中的错误导致 HTML 生成不正确，因为 rowspan 属性中的格式错误，现在它与 w3 语法匹配 (GH 38234)

分组/重采样/滚动#

在按多个 Categoricals 分组时，DataFrameGroupBy.count() 和 SeriesGroupBy.sum() 中返回缺失类别的 NaN 的错误。现在返回 0 (GH 35028)
DataFrameGroupBy.apply() 中的一个错误，如果分组轴有重复条目，有时会抛出错误的 ValueError (GH 16646)
在 DataFrame.resample() 中的一个错误，当从 "D" 重采样到 "24H" 跨越夏令时（DST）转换时会抛出 ValueError (GH 35219)
合并方法时出现的错误 DataFrame.groupby() 与 DataFrame.resample() 和 DataFrame.interpolate() 引发 TypeError (GH 35325)
在 DataFrameGroupBy.apply() 中的一个错误，如果在调用 .apply 之前调用了另一个 groupby 方法，非干扰的分组列将从输出列中删除 (GH 34656)
在 DataFrameGroupBy 上对列进行子集化时出现错误（例如 df.groupby('a')[['b']])）会将属性 axis、dropna、group_keys、level、mutated、sort 和 squeeze 重置为其默认值 (GH 9959)
DataFrameGroupBy.tshift() 中的一个错误，当无法为组的索引推断出频率时，未能引发 ValueError (GH 35937)
DataFrame.groupby() 中的错误并不总是保持 any, all, bfill, ffill, shift 的列索引名称 (GH 29764)
DataFrameGroupBy.apply() 中的错误在 dropna=False 时，遇到 np.nan 组会引发错误 (GH 35889)
在 Rolling.sum() 中的错误在浮点数和整数混合的 dtypes 且 axis=1 时返回了错误值 (GH 20649, GH 35596)
在 Rolling.count() 中的错误，当使用 FixedForwardWindowIndexer 作为窗口，min_periods=0 并且在窗口中只有缺失值时，返回了 np.nan (GH 35579)
当使用 PeriodIndex 时，Rolling 产生不正确的窗口大小的错误 (GH 34225)
DataFrameGroupBy.ffill() 和 DataFrameGroupBy.bfill() 中的一个错误，当 dropna=True 时，一个 NaN 组会返回填充的值而不是 NaN (GH 34725)
在指定 closed 参数时，RollingGroupby.count() 中的一个错误引发了 ValueError (GH 35869)
DataFrameGroupBy.rolling() 中的错误在部分中心窗口中返回错误值 (GH 36040)
DataFrameGroupBy.rolling() 中的错误在包含 NaN 的时间感知窗口中返回了错误值。现在由于窗口不是单调的，会引发 ValueError (GH 34617)
在 Rolling.__iter__() 中的一个错误，当 min_periods 大于 window 时没有引发 ValueError (GH 37156)
使用 Rolling.var() 而不是 Rolling.std() 可以避免 Rolling.corr() 在 Rolling.var() 仍在浮点精度内而 Rolling.std() 不在时的数值问题 (GH 31286)
在 DataFrameGroupBy.quantile() 和 Resampler.quantile() 中的错误在值为 Timedelta 类型时引发了 TypeError (GH 29485)
Rolling.median() 和 Rolling.quantile() 中的错误为 BaseIndexer 子类返回了错误的值，这些子类的窗口起始点或结束点是非单调的 (GH 37153)
在 DataFrame.groupby() 中的错误在 dropna=False 时通过单个列分组时从结果中删除了 nan 组 (GH 35646, GH 35542)
DataFrameGroupBy.head()、DataFrameGroupBy.tail()、SeriesGroupBy.head() 和 SeriesGroupBy.tail() 中的错误在使用 axis=1 时会引发 (GH 9772)
在 DataFrameGroupBy.transform() 中的错误会在使用 axis=1 和一个转换内核（例如“shift”）时引发 (GH 36308)
在 DataFrameGroupBy.resample() 中使用 .agg 与 sum 产生的结果与直接调用 .sum 不同 (GH 33548)
DataFrameGroupBy.apply() 中的错误在返回与原始帧相同轴时在 nan 组上丢弃了值 (GH 38227)
DataFrameGroupBy.quantile() 中的错误在按列分组时无法处理类似数组的 q (GH 33795)
在 DataFrameGroupBy.rank() 中使用 datetime64tz 或 period 数据类型时，错误地将结果转换为这些数据类型，而不是返回 float64 数据类型 (GH 38187)

Reshaping#

DataFrame.crosstab() 中的错误在输入具有重复行名、重复列名或行和列标签之间重复名称时返回不正确的结果 (GH 22529)
在 DataFrame.pivot_table() 中使用 aggfunc='count' 或 aggfunc='sum' 时，对于缺失的类别返回 NaN，当在 Categorical 上进行透视时。现在返回 0 (GH 31422)
在 concat() 和 DataFrame 构造函数中的错误，在某些情况下输入索引名称未被保留 (GH 13475)
在使用 margins=True 和 normalize=True 时，函数 crosstab() 中的错误（GH 35144）
在 DataFrame.stack() 中的一个错误，其中空的 DataFrame.stack 会引发错误 (GH 36113)。现在返回一个带有空 MultiIndex 的空 Series。
在 Series.unstack() 中的错误。现在，一个具有单层索引的 Series 尝试 unstack 会引发一个 ValueError (GH 36113)
在 DataFrame.columns==['Name'] 时，DataFrame.agg() 中 func={'name':<FUNC>} 错误地引发 TypeError 的 Bug (GH 36212)
在 Series.transform() 中的错误会在参数 func 是字典时给出不正确的结果或引发异常 (GH 35811)
DataFrame.pivot() 中的错误在行和列都是多索引时没有保留列的 MultiIndex 级别名称 (GH 36360)
当传递 columns 但未传递 values 时，DataFrame.pivot() 中的错误修改了 index 参数 (GH 37635)
DataFrame.join() 中的错误返回了不确定的层级顺序给结果的 MultiIndex (GH 36910)
DataFrame.combine_first() 中的错误导致与 string 类型和包含仅 NA 的 MultiIndex 的一级错误对齐 (GH 37591)
修复了在合并 DatetimeIndex 和空 DataFrame 时 merge() 中的回归问题 (GH 36895)
DataFrame.apply() 中的一个错误，当 func 返回类型是 dict 时，不会设置返回值的索引 (GH 37544)
DataFrame.merge() 和 pandas.merge() 中的错误在 how=right 和 how=left 的结果中返回不一致的顺序 (GH 35382)
merge_ordered() 中的错误无法处理类似列表的 left_by 或 right_by (GH 35269)
当 left_by 或 right_by 的长度等于 left 或 right 的行数时，merge_ordered() 中的错误返回了错误的连接结果 (GH 38166)
merge_ordered() 中的错误在 left_by 或 right_by 中的元素不存在于 left 列或 right 列时没有引发 (GH 38167)
DataFrame.drop_duplicates() 中的错误未验证 ignore_index 关键字的布尔数据类型 (GH 38274)

ExtensionArray#

修复了通过字典实例化将 DataFrame 列设置为标量扩展类型时被认为是对象类型而不是扩展类型的问题 (GH 35965)
修复了 astype() 在数据类型相同且 copy=False 时会返回新对象的错误 (GH 28488)
修复了将具有多个输出的NumPy ufunc应用于返回 None 的 IntegerArray 时的错误 (GH 36913)
修复了 PeriodArray 的 __init__ 签名与 DatetimeArray 和 TimedeltaArray 之间的不一致 (GH 37289)
BooleanArray、Categorical、DatetimeArray、FloatingArray、IntegerArray、PeriodArray、TimedeltaArray 和 PandasArray 的归约现在是仅关键字方法 (GH 37541)
修复了一个错误，如果在包含类似 nan 值的 ExtensionArray 上进行成员资格检查时，会错误地引发 TypeError (GH 37867)

其他#

DataFrame.replace() 和 Series.replace() 中的错误在传递无效参数组合时错误地引发 AssertionError 而不是 ValueError (GH 36045)
在 DataFrame.replace() 和 Series.replace() 中使用数值和字符串 to_replace 的错误 (GH 34789)
修复了 Series.abs() 和在 Series 和 DataFrames 上调用的 ufuncs 中的固定元数据传播 (GH 28283)
DataFrame.replace() 和 Series.replace() 中的错误，错误地将 PeriodDtype 转换为对象类型 (GH 34871)
修复了在元数据传播中错误地将 DataFrame 列复制为元数据的问题，当列名与元数据名重叠时 (GH 37037)
修复了 Series.dt, Series.str 访问器、DataFrame.duplicated、DataFrame.stack、DataFrame.unstack、DataFrame.pivot、DataFrame.append、DataFrame.diff、DataFrame.applymap 和 DataFrame.update 方法中的固定元数据传播 (GH 28283, GH 37381)
在使用 DataFrame.__getitem__ 选择列时修复了固定元数据的传播 (GH 28283)
在 Index.intersection() 中存在一个错误，当使用非 Index 时，无法在返回的 Index 上设置正确的名称 (GH 38111)
在某些极端情况下，RangeIndex.intersection() 中的错误导致返回的 Index 未能设置正确的名称 (GH 38197)
在某些极端情况下，Index.difference() 中的错误导致返回的 Index 未设置正确的名称 (GH 38268)
在 Index.union() 中的错误，其行为取决于操作数是 Index 还是其他类列表对象 (GH 36384)
Index.intersection() 中存在一个错误，当数值类型不匹配时，会转换为 object 类型，而不是最小公共类型 (GH 38122)
在空时，IntervalIndex.union() 返回类型不正确的 Index 的错误 (GH 38282)
将一个具有2个或更多维度的数组传递给 Series 构造函数现在会引发更具体的 ValueError 而不是简单的 Exception (GH 35744)
dir 中的一个错误，其中 dir(obj) 不会显示为 pandas 对象在实例上定义的属性 (GH 37173)
当索引有重复项时，Index.drop() 引发 InvalidIndexError 的错误 (GH 38051)
在某些情况下，RangeIndex.difference() 返回 Int64Index 而不是 RangeIndex 的错误 (GH 38028)
修复了在比较带有等效非扩展dtype数组的类似日期时间的数组时，assert_series_equal() 中的错误 (GH 37609)
在 is_bool_dtype() 中的错误会在传递一个有效字符串如 "boolean" 时引发 (GH 38386)
修复了在 DataFrame 的列是具有未使用类别的 CategoricalIndex 时，逻辑运算符引发 ValueError 的回归问题 (GH 38367)

贡献者#

总共有257人为此版本贡献了补丁。名字后面带有“+”的人首次贡献了补丁。

21CSM +
AbdulMAbdi +
Abhiraj Hinge +
Abhishek Mangla +
Abo7atm +
Adam Spannbauer +
Albert Villanova del Moral
Alex Kirko
Alex Lim +
Alex Thorne +
Aleš Erjavec +
Ali McMaster
Amanda Dsouza +
Amim Knabben +
Andrew Wieteska
Anshoo Rajput +
Anthony Milbourne
Arun12121 +
Asish Mahapatra
Avinash Pancham +
BeanNan +
Ben Forbes +
Brendan Wilby +
Bruno Almeida +
Byron Boulton +
Chankey Pathak
Chris Barnes +
Chris Lynch +
Chris Withers
Christoph Deil +
Christopher Hadley +
Chuanzhu Xu
Coelhudo +
Dan Moore
Daniel Saxton
David Kwong +
David Li +
David Mrva +
Deepak Pandey +
Deepyaman Datta
Devin Petersohn
Dmitriy Perepelkin +
Douglas Hanley +
Dāgs Grīnbergs +
Eli Treuherz +
Elliot Rampono +
Erfan Nariman
Eric Goddard
Eric Leung +
Eric Wieser
Ethan Chen +
Eve +
Eyal Trabelsi +
Fabian Gebhart +
Fangchen Li
Felix Claessen +
Finlay Maguire +
Florian Roscheck +
Gabriel Monteiro
Gautham +
Gerard Jorgensen +
Gregory Livschitz
Hans
Harsh Sharma
Honfung Wong +
Igor Gotlibovych +
Iqrar Agalosi Nureyza
Irv Lustig
Isaac Virshup
Jacob Peacock
Jacob Stevens-Haas +
Jan Müller +
Janus
Jeet Parekh
Jeff Hernandez +
Jeff Reback
Jiaxiang
Joao Pedro Berno Zanutto +
Joel Nothman
Joel Whittier +
John Karasinski +
John McGuigan +
Johnny Pribyl +
Jonas Laursen +
Jonathan Shreckengost +
Joris Van den Bossche
Jose +
JoseNavy +
Josh Temple +
Jun Kudo +
Justin Essert
Justin Sexton +
Kaiqi Dong
Kamil Trocewicz +
Karthik Mathur
Kashif +
Kenny Huynh
Kevin Sheppard
Kumar Shivam +
Leonardus Chen +
Levi Matus +
Lucas Rodés-Guirao +
Luis Pinto +
Lynch +
Marc Garcia
Marco Gorelli
Maria-Alexandra Ilie +
Marian Denes
Mark Graham +
Martin Durant
Matt Roeschke
Matthew Roeschke
Matthias Bussonnier
Maxim Ivanov +
Mayank Chaudhary +
MeeseeksMachine
Meghana Varanasi +
Metehan Kutlu +
Micael Jarniac +
Micah Smith +
Michael Marino
Miroslav Šedivý
Mohammad Jafar Mashhadi
Mohammed Kashif +
Nagesh Kumar C +
Nidhi Zare +
Nikhil Choudhary +
Number42
Oleh Kozynets +
OlivierLuG
Pandas Development Team
Paolo Lammens +
Paul Ganssle
Pax +
Peter Liu +
Philip Cerles +
Pranjal Bhardwaj +
Prayag Savsani +
Purushothaman Srikanth +
Qbiwan +
Rahul Chauhan +
Rahul Sathanapalli +
Rajat Bishnoi +
Ray Bell
Reshama Shaikh +
Richard Shadrach
Robert Bradshaw
Robert de Vries
Rohith295
S Mono +
S.TAKENO +
Sahid Velji +
Sam Cohen +
Sam Ezebunandu +
Sander +
Sarthak +
Sarthak Vineet Kumar +
Satrio H Wicaksono +
Scott Lasley
Shao Yang Hong +
Sharon Woo +
Shubham Mehra +
Simon Hawkins
Sixuan (Cherie) Wu +
Souris Ash +
Steffen Rehberg
Suvayu Ali
Sven
SylvainLan +
T. JEGHAM +
Terji Petersen
Thomas Dickson +
Thomas Heavey +
Thomas Smith
Tobias Pitters
Tom Augspurger
Tomasz Sakrejda +
Torsten Wörtwein +
Ty Mick +
UrielMaD +
Uwe L. Korn
Vikramaditya Gaonkar +
VirosaLi +
W.R +
Warren White +
Wesley Boelrijk +
William Ayd
Yanxian Lin +
Yassir Karroum +
Yong Kai Yi +
Yuanhao Geng +
Yury Mikhaylov +
Yutaro Ikeda
Yuya Takashina +
Zach Brookler +
Zak Kohler +
ZhihuiChen0903 +
abmyii
alexhtn +
asharma13524 +
attack68
beanan +
chinhwee
cleconte987
danchev +
ebardie +
edwardkong
elliot rampono +
estasney +
gabicca
geetha-rangaswamaiah +
gfyoung
guru kiran
hardikpnsp +
icanhazcodeplz +
ivanovmg +
jbrockmendel
jeschwar
jnecus
joooeey +
junk +
krajatcl +
lacrosse91 +
leo +
lpkirwin +
lrjball
lucasrodes +
ma3da +
mavismonica +
mlondschien +
mzeitlin11 +
nguevara +
nrebena
parkdj1 +
partev
patrick
realead
rxxg +
samilAyoub +
sanderland
shawnbrown
sm1899 +
smartvinnetou
ssortman +
steveya +
taytzehao +
tiagohonorato +
timhunderwood
tkmz-n +
tnwei +
tpanza +
vineethraj510 +
vmdhhh +
xinrong-databricks +
yonas kassa +
yonashub +
Ádám Lippai +

1.2.0 中的新功能（2020年12月26日）#

增强功能#

可选地禁止重复标签#

传递参数给 fsspec 后端#

在 to_csv 中支持二进制文件句柄#

在 to_latex 中支持短标题和表格位置#

read_csv 和 read_table 的默认浮点精度变化#

实验性的可空浮点数据类型#

聚合时保留索引/列名#

GroupBy 直接支持 EWM 操作#

其他增强功能#

值得注意的错误修复#

DataFrame 归约的一致性#

增加 Python 的最小版本#

增加了依赖项的最小版本#

其他 API 更改#

弃用#

性能提升#

错误修复#

Categorical#

Datetime-like#

Timedelta#

时区#

Numeric#

转换#

字符串#

Interval#

索引#

缺失#

MultiIndex#

I/O#

周期#

绘图#

Styler#

分组/重采样/滚动#

Reshaping#

ExtensionArray#

其他#

贡献者#

在 `to_csv` 中支持二进制文件句柄#

在 `to_latex` 中支持短标题和表格位置#

`read_csv` 和 `read_table` 的默认浮点精度变化#