1.3.0 中的新功能 (2021年7月2日)#

这些是 pandas 1.3.0 中的更改。有关完整的更改日志，包括其他版本的 pandas，请参见发行说明。

警告

当读取新的 Excel 2007+ (.xlsx) 文件时，read_excel() 的默认参数 engine=None 现在将在所有情况下使用 openpyxl 引擎，当选项 io.excel.xlsx.reader 设置为 "auto" 时。以前，某些情况下会使用 xlrd 引擎。有关此更改的背景，请参见 What’s new 1.2.0。

增强功能#

读取csv或json文件时的自定义HTTP(s)头#

当从一个fsspec不处理的远程URL（例如HTTP和HTTPS）读取时，传递给``storage_options``的字典将用于创建包含在请求中的头。这可以用来控制User-Agent头或发送其他自定义头（GH 36688）。例如：

In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
   ...:     "https://download.bls.gov/pub/time.series/cu/cu.item",
   ...:     sep="\t",
   ...:     storage_options=headers
   ...: )

读取和写入 XML 文档#

我们添加了 I/O 支持，以使用 read_xml() 和 DataFrame.to_xml() 读取和渲染 XML 文档的浅层版本。使用 lxml 作为解析器，可以使用 XPath 1.0 和 XSLT 1.0。(GH 27554)

In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
   ...: <data>
   ...:  <row>
   ...:     <shape>square</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides>4.0</sides>
   ...:  </row>
   ...:  <row>
   ...:     <shape>circle</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides/>
   ...:  </row>
   ...:  <row>
   ...:     <shape>triangle</shape>
   ...:     <degrees>180</degrees>
   ...:     <sides>3.0</sides>
   ...:  </row>
   ...:  </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
      shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>

更多信息，请参见用户指南中关于 IO 工具的编写 XML。

样式增强#

我们对 Styler 进行了一些重点开发。另请参阅修订和改进后的 Styler 文档 (GH 39720, GH 39317, GH 40493)。

方法 Styler.set_table_styles() 现在可以接受更自然的 CSS 语言作为参数，例如 'color:red;' 而不是 [('color', 'red')] (GH 39563)

方法 Styler.highlight_null()、Styler.highlight_min() 和 Styler.highlight_max() 现在允许自定义 CSS 高亮，而不是默认的背景着色 (GH 40242)

Styler.apply() 现在接受在 axis=None 时返回 ndarray 的函数，使其现在与 axis=0 和 axis=1 的行为一致 (GH 39359)

当通过 Styler.apply() 或 Styler.applymap() 提供格式不正确的 CSS 时，现在会在渲染时引发错误 (GH 39660)

Styler.format() 现在接受关键字参数 escape 用于可选的 HTML 和 LaTeX 转义 (GH 40388, GH 41619)

Styler.background_gradient() 增加了 gmap 参数，以提供特定的渐变图进行阴影处理 (GH 22727)

Styler.clear() 现在也清除 Styler.hidden_index 和 Styler.hidden_columns (GH 40484)

添加了方法 Styler.highlight_between() (GH 39821)

添加了方法 Styler.highlight_quantile() (GH 40926)

添加了方法 Styler.text_gradient() (GH 41098)

添加了方法 Styler.set_tooltips() 以允许悬停工具提示；这可以用来增强交互式显示 (GH 21266, GH 40284)

为方法 Styler.format() 添加了参数 precision 以控制浮点数的显示 (GH 40134)

Styler 渲染的 HTML 输出现在遵循 w3 HTML 样式指南 (GH 39626)

Styler 类的许多功能现在可以在具有非唯一索引或列的 DataFrame 上部分或完全使用 (GH 41143)

通过使用新的样式器选项分别对索引或列进行稀疏化，可以更好地控制显示，这些选项也可以通过 option_context() 使用 (GH 41142)。

添加了选项 styler.render.max_elements 以避免在样式化大型 DataFrame 时浏览器过载 (GH 40712)

添加了方法 Styler.to_latex() (GH 21673, GH 42320)，该方法还允许一些有限的 CSS 转换 (GH 40731)

添加了方法 Styler.to_html() (GH 13379)

添加了方法 Styler.set_sticky() 以在滚动 HTML 框架中使索引和列标题永久可见 (GH 29072)

DataFrame 构造函数在处理 `copy=False` 时尊重字典#

当将字典传递给 DataFrame 并设置 copy=False 时，将不再进行复制 (GH 32960)。

In [1]: arr = np.array([1, 2, 3])

In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [3]: df
Out[3]: 
   A  B
0  1  1
1  2  2
2  3  3

df["A"] 仍然是 arr 的一个视图：

In [4]: arr[0] = 0

In [5]: assert df.iloc[0, 0] == 0

当不传递 copy 时的默认行为将保持不变，即会进行复制。

PyArrow 支持的字符串数据类型#

我们已经增强了 StringDtype，这是一个专门用于字符串数据的扩展类型。(GH 39908)

现在可以为 StringDtype 指定一个 storage 关键字选项。使用 pandas 选项或通过 dtype='string[pyarrow]' 指定 dtype，以允许 StringArray 由 PyArrow 数组而不是 Python 对象的 NumPy 数组支持。

PyArrow 支持的 StringArray 需要安装 pyarrow 1.0.0 或更高版本。

警告

string[pyarrow] 目前被认为是实验性的。其实现和部分API可能会在没有警告的情况下发生变化。

In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]: 
0     abc
1    <NA>
2     def
dtype: string

你也可以使用别名 "string[pyarrow]"。

In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")

In [8]: s
Out[8]: 
0     abc
1    <NA>
2     def
dtype: string

你也可以使用 pandas 选项创建一个由 PyArrow 支持的字符串数组。

In [9]: with pd.option_context("string_storage", "pyarrow"):
   ...:     s = pd.Series(['abc', None, 'def'], dtype="string")
   ...: 

In [10]: s
Out[10]: 
0     abc
1    <NA>
2     def
dtype: string

通常的字符串访问方法仍然有效。在适当的情况下，Series 或 DataFrame 列的返回类型也将具有字符串 dtype。

In [11]: s.str.upper()
Out[11]: 
0     ABC
1    <NA>
2     DEF
dtype: string

In [12]: s.str.split('b', expand=True).dtypes
Out[12]: 
0    string[pyarrow]
1    string[pyarrow]
dtype: object

返回整数的字符串访问器方法将返回一个具有 Int64Dtype 的值

In [13]: s.str.count("a")
Out[13]: 
0       1
1    <NA>
2       0
dtype: Int64

居中的类似日期时间的滚动窗口#

在对具有类似日期时间索引的 DataFrame 和 Series 对象执行滚动计算时，现在可以使用居中的类似日期时间窗口 (GH 38780)。例如：

In [14]: df = pd.DataFrame(
   ....:     {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
   ....: )
   ....: 

In [15]: df
Out[15]: 
            A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

In [16]: df.rolling("2D", center=True).mean()
Out[16]: 
              A
2020-01-01  0.5
2020-01-02  1.5
2020-01-03  2.5
2020-01-04  3.5
2020-01-05  4.0

其他增强功能#

DataFrame.rolling(), Series.rolling(), DataFrame.expanding(), 和 Series.expanding() 现在支持一个带有 'table' 选项的 method 参数，该选项在整个 DataFrame 上执行窗口操作。有关性能和功能优势，请参见窗口概述 (GH 15095, GH 38995)
ExponentialMovingWindow 现在支持一个 online 方法，可以在线进行 mean 计算。请参见窗口概述 (GH 41673)
添加了 MultiIndex.dtypes() (GH 37062)
在 DataFrame.resample() 中为 origin 参数添加了 end 和 end_day 选项 (GH 37804)
当 usecols 和 names 与 read_csv() 和 engine="c" 不匹配时，改进了错误信息 (GH 29042)
在窗口方法中传递无效的 win_type 参数时，改进了错误消息的一致性 (GH 15969)
read_sql_query() 现在接受一个 dtype 参数，用于根据用户输入将 SQL 数据库中的列数据进行类型转换 (GH 10285)
read_csv() 现在在 usecols 未指定时，如果标题或给定名称的长度与数据长度不匹配，则会引发 ParserWarning (GH 21768)
在使用 DataFrame.to_sql() 时，改进了从 pandas 到 SQLAlchemy 的整数类型映射 (GH 35076)
to_numeric() 现在支持对可空 ExtensionDtype 对象的向下转换 (GH 33013)
在 MultiIndex.set_names 和 MultiIndex.rename 中添加了对类似字典名称的支持 (GH 20421)
read_excel() 现在可以自动检测 .xlsb 文件和旧的 .xls 文件 (GH 35416, GH 41225)
ExcelWriter 现在接受一个 if_sheet_exists 参数来控制追加模式下写入现有工作表时的行为 (GH 40230)
Rolling.sum(), Expanding.sum(), Rolling.mean(), Expanding.mean(), ExponentialMovingWindow.mean(), Rolling.median(), Expanding.median(), Rolling.max(), Expanding.max(), Rolling.min(), 和 Expanding.min() 现在支持使用 engine 关键字的 Numba 执行 (GH 38895, GH 41267)
DataFrame.apply() 现在可以接受作为字符串的 NumPy 一元运算符，例如 df.apply("sqrt")，这已经是 Series.apply() 的情况（GH 39116）
DataFrame.apply() 现在可以接受作为字符串的不可调用 DataFrame 属性，例如 df.apply("size")，这已经是 Series.apply() 的情况 (GH 39116)
DataFrame.applymap() 现在可以接受 kwargs 以传递给用户提供的 func (GH 39987)
传递一个 DataFrame 索引器给 iloc 现在不允许用于 Series.__getitem__() 和 DataFrame.__getitem__() (GH 39004)
Series.apply() 现在可以接受类似列表或类似字典的参数，这些参数不是列表或字典，例如 ser.apply(np.array(["sum", "mean"]))，这已经是 DataFrame.apply() 的情况 (GH 39140)
DataFrame.plot.scatter() 现在可以接受一个分类列作为参数 c (GH 12380, GH 31357)
Series.loc() 现在在 Series 具有 MultiIndex 且索引器维度过多时会引发一个有用的错误信息 (GH 35349)
read_stata() 现在支持从压缩文件中读取数据 (GH 26599)
增加了对解析带有负号的 ISO 8601 类似时间戳的支持到 Timedelta (GH 37172)
在 FloatingArray 中添加了对一元运算符的支持 (GH 38749)
RangeIndex 现在可以通过直接传递一个 range 对象来构造，例如 pd.RangeIndex(range(3)) (GH 12067)
Series.round() 和 DataFrame.round() 现在支持可空整数和浮点数据类型 (GH 38844)
read_csv() 和 read_json() 暴露了参数 encoding_errors 来控制如何处理编码错误 (GH 39450)
DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all(), 和 SeriesGroupBy.all() 在可空数据类型中使用 Kleene 逻辑 (GH 37506)
DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all(), 和 SeriesGroupBy.all() 对于具有可空数据类型的列返回一个 BooleanDtype (GH 33449)
DataFrameGroupBy.any(), SeriesGroupBy.any(), DataFrameGroupBy.all(), 和 SeriesGroupBy.all() 在包含 pd.NA 的 object 数据上即使 skipna=True 也会引发 (GH 37501)
DataFrameGroupBy.rank() 和 SeriesGroupBy.rank() 现在支持对象类型的数据 (GH 38278)
使用 data 参数为不是由 NumPy 标量组成的 NumPy ndarray 的 Python 可迭代对象来构造 DataFrame 或 Series ，现在将导致一个精度为 NumPy 标量最大值的 dtype；当 data 是 NumPy ndarray 时，这已经是这种情况 (GH 40908)
向 pivot_table() 添加关键字 sort 以允许不排序结果 (GH 39143)
向 DataFrame.value_counts() 添加关键字 dropna 以允许计算包含 NA 值的行 (GH 41325)
Series.replace() 现在会在可能的情况下将结果转换为 PeriodDtype 而不是 object dtype (GH 41526)
在 corr 和 cov 方法中改进了错误信息，当 other 不是 DataFrame 或 Series 时，针对 Rolling、Expanding 和 ExponentialMovingWindow (GH 41741)
Series.between() 现在可以接受 left 或 right 作为 inclusive 的参数，以仅包含左边界或右边界 (GH 40245)
DataFrame.explode() 现在支持展开多个列。它的 column 参数现在也接受一个字符串列表或元组，以便同时展开多个列 (GH 39240)
DataFrame.sample() 现在接受 ignore_index 参数，以便在采样后重置索引，类似于 DataFrame.drop_duplicates() 和 DataFrame.sort_values() (GH 38581)

值得注意的错误修复#

这些是可能会有显著行为变化的错误修复。

`Categorical.unique` 现在总是保持与原始数据相同的 dtype#

之前，当调用 Categorical.unique() 处理分类数据时，新数组中未使用的类别会被移除，使得新数组的 dtype 与原始数组不同 (GH 18291)

作为一个例子，给定：

In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [19]: original = pd.Series(cat)

In [20]: unique = original.unique()

以前的行为:

In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False

新行为:

In [21]: unique
Out[21]: 
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']

In [22]: original.dtype == unique.dtype
Out[22]: True

在 `DataFrame.combine_first()` 中保留数据类型#

DataFrame.combine_first() 现在将保留 dtypes (GH 7509)

In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [24]: df1
Out[24]: 
   A  B
0  1  1
1  2  2
2  3  3

In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [26]: df2
Out[26]: 
   B  C
2  4  1
3  5  2
4  6  3

In [27]: combined = df1.combine_first(df2)

以前的行为:

In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object

新行为:

In [28]: combined.dtypes
Out[28]: 
A    float64
B      int64
C    float64
dtype: object

Groupby 方法 agg 和 transform 不再更改可调用对象的返回数据类型#

之前，方法 DataFrameGroupBy.aggregate()、SeriesGroupBy.aggregate()、DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 在参数 func 是可调用对象时可能会转换结果的数据类型，这可能会导致不希望的结果 (GH 21240)。如果结果是数值型，并且通过 np.allclose 测量，转换回输入数据类型不会改变任何值，则会发生这种转换。现在不会发生这种转换。

In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [30]: df
Out[30]: 
   key      a     b
0    1   True  True
1    1  False  True

以前的行为:

In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
        a  b
key
1    True  2

新行为:

In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]: 
     a  b
key      
1    1  2

`float` 结果用于 `DataFrameGroupBy.mean()`、`DataFrameGroupBy.median()` 和 `GDataFrameGroupBy.var()`、`SeriesGroupBy.mean()`、`SeriesGroupBy.median()` 和 `SeriesGroupBy.var()`#

之前，这些方法可能会根据输入值产生不同的 dtypes。现在，这些方法将始终返回一个浮点 dtype。(GH 41137)

In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})

以前的行为:

In [5]: df.groupby(df.index).mean()
Out[5]:
        a  b    c
0    True  1  1.0

新行为:

In [33]: df.groupby(df.index).mean()
Out[33]: 
     a    b    c
0  1.0  1.0  1.0

在使用 `loc` 和 `iloc` 设置值时，尝试就地操作#

当使用 loc 或 iloc 设置整个列时，pandas 会尝试将值插入现有数据中，而不是创建一个全新的数组。

In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [35]: values = df.values

In [36]: new = np.array([5, 6, 7], dtype="int64")

In [37]: df.loc[[0, 1, 2], "A"] = new

在新旧行为中，values 中的数据都被覆盖，但在旧行为中，df["A"] 的 dtype 变为 int64。

以前的行为:

In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False

在 pandas 1.3.0 中，df 继续与 values 共享数据

新行为:

In [38]: df.dtypes
Out[38]: 
A    float64
dtype: object

In [39]: np.shares_memory(df["A"], new)
Out[39]: False

In [40]: np.shares_memory(df["A"], values)
Out[40]: True

在设置 `frame[keys] = values` 时，切勿进行就地操作#

当使用 frame[keys] = values 设置多列时，新数组将替换这些键的现有数组，这些数组将不会被覆盖 (GH 39510)。因此，这些列将保留 values 的 dtype，绝不会转换为现有数组的 dtype。

In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [42]: df[["A"]] = 5

在旧的行为中，5 被转换为 float64 并插入到现有的数组中，该数组支持 df：

以前的行为:

In [1]: df.dtypes
Out[1]:
A    float64

在新行为中，我们得到一个新数组，并保留一个整数类型的 5：

新行为:

In [43]: df.dtypes
Out[43]: 
A    int64
dtype: object

在设置为布尔序列时进行一致的类型转换#

将非布尔值设置到 dtype=bool 的 Series 中现在一致地转换为 dtype=object (GH 38709)

In [1]: orig = pd.Series([True, False])

In [2]: ser = orig.copy()

In [3]: ser.iloc[1] = np.nan

In [4]: ser2 = orig.copy()

In [5]: ser2.iloc[1] = 2.0

以前的行为:

In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

新行为:

In [1]: ser
Out [1]:
0    True
1     NaN
dtype: object

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 不再在值中返回分组列#

group-by 列现在将从 groupby.rolling 操作的结果中删除 (GH 32262)

In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [45]: df
Out[45]: 
   A  B
0  1  0
1  1  1
2  2  2
3  3  3

以前的行为:

In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
       A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN

新行为:

In [46]: df.groupby("A").rolling(2).sum()
Out[46]: 
       B
A       
1 0  NaN
  1  1.0
2 2  NaN
3 3  NaN

在滚动方差和标准差中移除了人工截断#

Rolling.std() 和 Rolling.var() 将不再人为地将小于 ~1e-8 和 ~1e-15 的结果分别截断为零 (GH 37051, GH 40448, GH 39872)。

然而，在滚动较大值时，结果中可能现在存在浮点数伪影。

In [47]: s = pd.Series([7, 5, 5, 5])

In [48]: s.rolling(3).var()
Out[48]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
dtype: float64

具有 MultiIndex 的 DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 在结果中不再删除级别#

DataFrameGroupBy.rolling() 和 SeriesGroupBy.rolling() 将不再在结果中删除具有 MultiIndex 的 DataFrame 的级别。这可能导致结果中的 MultiIndex 级别出现感知上的重复，但这一更改恢复了 1.1.3 版本中的行为（GH 38787, GH 38523）。

In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [51]: df
Out[51]: 
               a  b
label1 label2      
idx1   idx2    1  2

以前的行为:

In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
          a    b
label1
idx1    1.0  2.0

新行为:

In [52]: df.groupby('label1').rolling(1).sum()
Out[52]: 
                        a    b
label1 label1 label2          
idx1   idx1   idx2    1.0  2.0

向后不兼容的 API 变化#

增加了依赖项的最小版本#

一些依赖项的最低支持版本已更新。如果已安装，我们现在要求：

包	最低版本	必需的	Changed
numpy	1.17.3	X	X
pytz	2017.3	X
python-dateutil	2.7.3	X
瓶颈	1.2.1
numexpr	2.7.0		X
pytest (开发版)	6.0		X
mypy (dev)	0.812		X
setuptools	38.6.0		X

对于可选库，一般的建议是使用最新版本。下表列出了每个库在 pandas 开发过程中当前测试的最低版本。低于最低测试版本的可选库可能仍然有效，但不被视为支持。

包	最低版本	Changed
beautifulsoup4	4.6.0
fastparquet	0.4.0	X
fsspec	0.7.4
gcsfs	0.6.0
lxml	4.3.0
matplotlib	2.2.3
numba	0.46.0
openpyxl	3.0.0	X
pyarrow	0.17.0	X
pymysql	0.8.1	X
pytables	3.5.1
s3fs	0.4.0
scipy	1.2.0
sqlalchemy	1.3.0	X
tabulate	0.8.7	X
xarray	0.12.0
xlrd	1.2.0
xlsxwriter	1.0.2
xlwt	1.3.0
pandas-gbq	0.12.0

更多信息请参见依赖项和可选依赖项。

其他 API 更改#

部分初始化的 CategoricalDtype 对象（即那些 categories=None 的对象）将不再与完全初始化的 dtype 对象进行相等比较 (GH 38516)
在 DataFrame 上访问 _constructor_expanddim 和在 Series 上访问 _constructor_sliced 现在会引发 AttributeError。以前会引发 NotImplementedError (GH 38782)
为 DataFrame.to_sql() 添加了新的 engine 和 **engine_kwargs 参数，以支持其他未来的 “SQL 引擎”。目前我们仍然只在底层使用 SQLAlchemy，但计划支持更多引擎，例如 turbodbc (GH 36893)
从 PeriodIndex 的字符串表示中移除了多余的 freq (GH 41653)
ExtensionDtype.construct_array_type() 现在是对 ExtensionDtype 子类的必需方法，而不是可选方法 (GH 24860)
对不可哈希的 pandas 对象调用 hash 现在会引发内置错误消息的 TypeError （例如 unhashable type: 'Series'）。以前它会引发自定义消息，例如 'Series' 对象是可变的，因此它们不能被哈希。此外，isinstance(<Series>, abc.collections.Hashable) 现在将返回 False (GH 40013)
Styler.from_custom_template() 现在有两个新的模板名称参数，并且由于引入了模板继承以更好地解析（GH 42053），删除了旧的 name。还需要对 Styler 属性进行子类化修改。

构建#

.pptx 和 .pdf 格式的文档不再包含在轮子或源代码分发中。(GH 30741)

弃用#

在 DataFrame 缩减和 DataFrameGroupBy 操作中弃用丢弃烦扰列#

在 DataFrame 上调用一个归约（例如 .min、.max、.sum）并且 numeric_only=None``（默认值），在归约引发 ``TypeError 的列会被静默忽略并从结果中丢弃。

此行为已被弃用。在未来的版本中，将引发 TypeError ，用户需要在调用函数之前仅选择有效列。

例如：

In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [54]: df
Out[54]: 
   A          B
0  1 2016-01-01
1  2 2016-01-02
2  3 2016-01-03
3  4 2016-01-04

旧行为:

In [3]: df.prod()
Out[3]:
Out[3]:
A    24
dtype: int64

未来行为:

In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'

In [5]: df[["A"]].prod()
Out[5]:
A    24
dtype: int64

同样地，当对 DataFrameGroupBy 应用函数时，当前在函数引发 TypeError 的列会被静默忽略并从结果中丢弃。

此行为已被弃用。在未来的版本中，将引发 TypeError ，用户需要在调用函数之前仅选择有效列。

例如：

In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [56]: gb = df.groupby([1, 1, 2, 2])

旧行为:

In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1   2
2  12

未来行为:

In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations

In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
    A
1   2
2  12

其他弃用#

已弃用允许标量传递给 Categorical 构造函数 (GH 38433)
不推荐在没有传递类似列表数据的情况下构造 CategoricalIndex (GH 38944)
在 Index 构造函数中弃用允许子类特定的关键字参数，请直接使用特定的子类代替 (GH 14093, GH 21311, GH 22315, GH 26974)
弃用了 datetimelike（timedelta64[ns]、datetime64[ns]、Datetime64TZDtype、PeriodDtype）的 astype() 方法以转换为整数数据类型，请改用 values.view(...) (GH 38544)。这一弃用后来在 pandas 1.4.0 中被撤销。
已弃用 MultiIndex.is_lexsorted() 和 MultiIndex.lexsort_depth()，请改用 MultiIndex.is_monotonic_increasing() (GH 32259)
已弃用的关键字 try_cast 在 Series.where(), Series.mask(), DataFrame.where(), DataFrame.mask()；如果需要，请手动转换结果 (GH 38836)
不推荐将 Timestamp 对象与 datetime.date 对象进行比较。应使用例如 ts <= pd.Timestamp(mydate) 或 ts.date() <= mydate 代替 (GH 36131)
已弃用 Rolling.win_type 返回 "freq" (GH 38963)
已弃用 Rolling.is_datetimelike (GH 38963)
已弃用的 DataFrame 索引器用于 Series.__setitem__() 和 DataFrame.__setitem__() (GH 39004)
已弃用 ExponentialMovingWindow.vol() (GH 39220)
使用 .astype 在 datetime64[ns] 数据类型和 DatetimeTZDtype 之间转换已被弃用，并且在未来的版本中将会引发错误，请使用 obj.tz_localize 或 obj.dt.tz_localize 代替 (GH 38622)
在 DataFrame.unstack(), DataFrame.shift(), Series.shift(), 和 DataFrame.reindex() 中使用 fill_value 时，弃用将 datetime.date 对象转换为 datetime64，请改为传递 pd.Timestamp(dateobj) (GH 39767)
弃用 Styler.set_na_rep() 和 Styler.set_precision()，改为使用带有 na_rep 和 precision 作为现有和新输入参数的 Styler.format() (GH 40134, GH 40425)
弃用 Styler.where() ，建议使用 Styler.applymap() 的替代方案（GH 40821）
在 Series.transform() 和 DataFrame.transform() 中允许部分失败已被弃用，当 func 是类列表或类字典且引发除 TypeError 以外的任何错误时；func 引发除 TypeError 以外的任何错误将在未来版本中引发 (GH 40211)
在 read_csv() 和 read_table() 中弃用的参数 error_bad_lines 和 warn_bad_lines ，改为使用参数 on_bad_lines (GH 15122)
在 DataFrame 构造函数中弃用对 np.ma.mrecords.MaskedRecords 的支持，请改用 {name: data[name] for name in data.dtype.names} (GH 40363)
不推荐在不同数量的层级上使用 merge()、DataFrame.merge() 和 DataFrame.join() (GH 34862)
弃用了在 ExcelWriter 中使用 **kwargs ；请改用关键字参数 engine_kwargs 代替 (GH 40430)
弃用了 DataFrame 和 Series 聚合的 level 关键字；请改用 groupby (GH 39983)
弃用了 Categorical.remove_categories()、Categorical.add_categories()、Categorical.reorder_categories()、Categorical.rename_categories()、Categorical.set_categories() 的 inplace 参数，并将在未来版本中移除 (GH 37643)
已弃用 merge() 通过 suffixes 关键字生成重复列并且已经存在的列 (GH 22818)
已弃用的设置 Categorical._codes ，请创建一个新的 Categorical 并设置所需的代码代替 (GH 40606)
在 read_excel() 和 ExcelFile.parse() 中弃用了 convert_float 可选参数 (GH 41127)
不推荐的行为：使用混合时区的 DatetimeIndex.union()；在未来的版本中，两者将被转换为UTC而不是对象类型 (GH 39328)
使用 usecols 和超出范围的索引调用 read_csv() 时，使用 engine="c" 已被弃用 (GH 25623)
在 DataFrame 构造函数中弃用了对第一个元素为 Categorical 的列表的特殊处理；改为传递 pd.DataFrame({col: categorical, ...}) 代替 (GH 38845)
当传递 dtype 并且数据无法转换为该数据类型时，DataFrame 构造函数的不推荐行为。在未来的版本中，这将引发错误而不是被默默忽略 (GH 24435)
弃用了 Timestamp.freq 属性。对于使用它的属性（is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end），当你有一个 freq 时，使用例如 freq.is_month_start(ts) (GH 15146)
不推荐使用 DatetimeTZDtype 数据和 datetime64[ns] dtype 构建 Series 或 DataFrame。请改用 Series(data).dt.tz_localize(None) (GH 41555, GH 33401)
不推荐使用 Series 构造函数处理大整数值和小整数 dtype 时静默溢出的行为；请改用 Series(data).astype(dtype) (GH 41734)
在 DataFrame 构造中使用浮点数据和整数数据类型转换的弃用行为，即使在有损情况下也是如此；在未来的版本中，这将保持浮点数，匹配 Series 的行为 (GH 41770)
在传递包含字符串的数据且未传递 dtype 时，在 Series 构造中弃用推断 timedelta64[ns]、datetime64[ns] 或 DatetimeTZDtype dtypes 的行为 (GH 33558)
在未来的版本中，使用 datetime64[ns] 数据和 DatetimeTZDtype 构建 Series 或 DataFrame 时，会将数据视为本地时间而不是UTC时间（与DatetimeIndex行为匹配）。要将数据视为UTC时间，请使用 pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz) 或 pd.Series(data.view("int64"), dtype=dtype) (GH 33401)
不推荐将列表作为 key 传递给 DataFrame.xs() 和 Series.xs() (GH 41760)
在 Series.between() 中弃用 inclusive 的布尔参数，改为使用 {"left", "right", "neither", "both"} 作为标准参数值 (GH 40628)
不推荐将参数作为位置参数传递给以下所有内容，但有例外情况（GH 41485）：
- concat() （除了 objs）
- read_csv() (除了 filepath_or_buffer)
- read_table() （除了 filepath_or_buffer 之外）
- DataFrame.clip() 和 Series.clip() (除了 upper 和 lower)
- DataFrame.drop_duplicates`（除了 ``subset`() 之外），Series.drop_duplicates()，Index.drop_duplicates() 和 MultiIndex.drop_duplicates()
- DataFrame.drop() (除了 labels) 和 Series.drop()
- DataFrame.dropna() 和 Series.dropna()
- DataFrame.ffill(), Series.ffill(), DataFrame.bfill(), 和 Series.bfill()
- DataFrame.fillna() 和 Series.fillna() (除了 value)
- DataFrame.interpolate() 和 Series.interpolate() （除了 method 之外）
- DataFrame.mask() 和 Series.mask() (除了 cond 和 other)
- DataFrame.reset_index() (除了 level) 和 Series.reset_index()
- DataFrame.set_axis() 和 Series.set_axis() (除了 labels)
- DataFrame.set_index() （除了 keys 之外）
- DataFrame.sort_index() 和 Series.sort_index()
- DataFrame.sort_values`（除了 ``by`() 之外）和 Series.sort_values()
- DataFrame.where() 和 Series.where() (除了 cond 和 other)
- Index.set_names() 和 MultiIndex.set_names() (除了 names)
- MultiIndex.codes() （除了 codes）
- MultiIndex.set_levels() （除了 levels）
- Resampler.interpolate() （除了 method 之外）

性能提升#

在 IntervalIndex.isin() 中的性能提升 (GH 38353)
在可空数据类型中对 Series.mean() 的性能改进 (GH 34814)
在可空数据类型中 Series.isin() 的性能改进 (GH 38340)
在使用 method="pad" 或 method="backfill" 方法时，DataFrame.fillna() 对可空浮点和可空整数数据类型的性能改进 (GH 39953)
在 DataFrame.corr() 中对 method=kendall 的性能改进 (GH 28329)
在 DataFrame.corr() 中对 method=spearman 的性能改进 (GH 40956, GH 41885)
在 Rolling.corr() 和 Rolling.cov() 中的性能提升 (GH 39388)
在 RollingGroupby.corr()、ExpandingGroupby.corr()、ExpandingGroupby.corr() 和 ExpandingGroupby.cov() 中的性能提升 (GH 39591)
在对象数据类型中对 unique() 的性能改进 (GH 37615)
在基本情况下（包括分隔符）对 json_normalize() 的性能改进 (GH 40035 GH 15621)
在 ExpandingGroupby 聚合方法中的性能提升 (GH 39664)
在 Styler 中的性能提升，渲染时间减少了超过 50%，现在与 DataFrame.to_html() 匹配 (GH 39972 GH 39952, GH 40425)
方法 Styler.set_td_classes() 现在与 Styler.apply() 和 Styler.applymap() 一样高效，甚至在某些情况下更高效 (GH 40453)
在 ExponentialMovingWindow.mean() 中使用 times 的性能提升 (GH 39784)
在需要Python回退实现时，DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 的性能改进 (GH 40176)
在将 PyArrow 布尔数组转换为 pandas 可空布尔数组时的性能改进 (GH 41051)
对于类型为 CategoricalDtype 的数据进行拼接的性能改进 (GH 40193)
在 DataFrameGroupBy.cummin()、SeriesGroupBy.cummin()、DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 中使用可空数据类型提高了性能 (GH 37493)
在包含 nan 值的情况下，Series.nunique() 的性能改进 (GH 40865)
在 DataFrame.transpose() 和 Series.unstack() 中使用 DatetimeTZDtype 的性能改进 (GH 40149)
在 Series.plot() 和 DataFrame.plot() 中通过入口点惰性加载实现的性能提升 (GH 41492)

错误修复#

Categorical#

CategoricalIndex 中的错误，在传递标量数据时未能正确引发 TypeError (GH 38614)
在 CategoricalIndex.reindex 中的错误在传递的 Index 不是分类但它的值都是类别中的标签时失败了 (GH 28690)
从 date 对象的对象类型数组构造 Categorical 时，使用 astype 不能正确地往返 (GH 38552)
从 ndarray 和 CategoricalDtype 构建 DataFrame 时出现的错误 (GH 38857)
在 DataFrame 中将分类值设置到 object-dtype 列中的错误 (GH 39136)
在 DataFrame.reindex() 中的错误在新的索引包含重复项且旧索引是 CategoricalIndex 时会引发 IndexError (GH 38906)
在 Categorical.fillna() 中使用类似元组的类别时，填充非类别元组时引发 NotImplementedError 而不是 ValueError 的错误 (GH 41914)

Datetimelike#

DataFrame 和 Series 构造函数有时会从 Timestamp`（分别是 :class:`Timedelta）``data`` 中丢弃纳秒，使用 dtype=datetime64[ns]``（分别是 ``timedelta64[ns]）（GH 38032）
在 DataFrame.first() 和 Series.first() 中存在一个错误，当第一个日期是某个月的最后一天时，使用一个月的偏移量会返回错误的结果 (GH 29623)
在构建 DataFrame 或 Series 时，如果 datetime64 数据和 timedelta64 dtype 不匹配，或者反之，未能引发 TypeError 的错误 (GH 38575, GH 38764, GH 38792)
在构建 Series 或 DataFrame 时，使用超出 datetime64[ns] dtype 范围的 datetime 对象或超出 timedelta64[ns] dtype 范围的 timedelta 对象的错误 (GH 38792, GH 38965)
在 DatetimeIndex.intersection()、DatetimeIndex.symmetric_difference()、PeriodIndex.intersection()、PeriodIndex.symmetric_difference() 中存在一个错误，当与 CategoricalIndex 操作时总是返回 object-dtype (GH 38741)
在 DatetimeIndex.intersection() 中存在一个错误，当频率为非Tick且 n != 1 时，结果不正确 (GH 42104)
在 Series.where() 中的错误不正确地将 datetime64 值转换为 int64 (GH 37682)
在 Categorical 中错误地将 datetime 对象类型转换为 Timestamp 的错误 (GH 38878)
在 Timestamp 对象和 datetime64 对象之间的比较中存在一个错误，这些对象恰好在纳秒 datetime64 的实现边界之外 (GH 39221)
在 Timestamp.round()、Timestamp.floor()、Timestamp.ceil() 中存在一个错误，对于接近 Timestamp 实现边界值的情况 (GH 39244)
在接近 Timedelta 实现边界值时，Timedelta.round()、Timedelta.floor()、Timedelta.ceil() 中的错误 (GH 38964)
在 date_range() 中的错误在某些情况下错误地创建了包含 NaT 的 DatetimeIndex ，而不是引发 OutOfBoundsDatetime (GH 24124)
在 infer_freq() 中的错误未能正确推断出具有时区和跨越夏令时边界的 DatetimeIndex 的 ‘H’ 频率 (GH 39556)
Series 由 DatetimeArray 或 TimedeltaArray 支持的错误有时无法将数组的 freq 设置为 None (GH 41425)

Timedelta#

从 np.timedelta64 对象构造 Timedelta 时，非纳秒单位的值超出 timedelta64[ns] 范围的错误 (GH 38965)
在构建 TimedeltaIndex 时错误地接受 np.datetime64("NaT") 对象的错误 (GH 39462)
从仅包含符号且无数字的输入字符串构造 Timedelta 时未能引发错误 (GH 39710)
TimedeltaIndex 和 to_timedelta() 中的错误，当传递非纳秒 timedelta64 数组时，在转换为 timedelta64[ns] 时溢出不会引发 (GH 40008)

时区#

不同 tzinfo 对象表示的 UTC 未被视为等效的错误 (GH 39216)
dateutil.tz.gettz("UTC") 中的错误未被识别为与其他表示UTC的tzinfo等效 (GH 39276)

Numeric#

DataFrame.quantile() 和 DataFrame.sort_values() 中的错误导致后续索引行为不正确 (GH 38351)
DataFrame.sort_values() 中的一个错误，在空的 by 时引发 IndexError (GH 40258)
使用 include=np.number 时 DataFrame.select_dtypes() 中的错误会删除数字 ExtensionDtype 列 (GH 35340)
DataFrame.mode() 和 Series.mode() 中的错误，对于空输入未保持一致的整数 Index (GH 33321)
当 DataFrame 包含 np.inf 时，DataFrame.rank() 中的 Bug (GH 32593)
在 axis=0 且列包含不可比较类型时，DataFrame.rank() 中的错误引发 IndexError (GH 38932)
在 Series.rank(), DataFrame.rank(), DataFrameGroupBy.rank(), 和 SeriesGroupBy.rank() 中，将最小的 int64 值视为缺失的错误 (GH 32859)
在 DataFrame.select_dtypes() 中的错误：在 Windows 和 Linux 上使用 include="int" 时行为不同 (GH 36596)
当传递参数 func="size" 时，DataFrame.apply() 和 DataFrame.agg() 中的错误会在整个 DataFrame 上操作，而不是在行或列上操作 (GH 39934)
在 DataFrame.transform() 中的错误会在传递字典且缺少列时引发 SpecificationError；现在将改为引发 KeyError (GH 40004)
DataFrameGroupBy.rank() 和 SeriesGroupBy.rank() 中的错误，在 pct=True 且连续组之间有相同值时给出不正确的结果 (GH 40518)
在 Series.count() 中的错误会导致在 32 位平台上当参数 level=None 时结果为 int32 (GH 40908)
Series 和 DataFrame 在 any 和 all 方法的归约中存在一个错误，即对于对象数据没有返回布尔结果 (GH 12863, GH 35450, GH 27709)
如果 Series.clip() 包含 NA 值并且数据类型为可空整数或浮点数，则会出现错误 (GH 40851)
在 UInt64Index.where() 和 UInt64Index.putmask() 中，当 other 的类型为 np.int64 时，错误地引发 TypeError 的问题 (GH 41974)
在 DataFrame.agg() 中的错误，当一个或多个聚合函数未能产生结果时，不会按提供的聚合函数顺序对聚合轴进行排序 (GH 33634)
在 DataFrame.clip() 中的错误未将缺失值解释为无阈值 (GH 40420)

转换#

Series.to_dict() 中 orient='records' 的错误现在返回 Python 原生类型 (GH 25969)
在转换类似日期时间（datetime64[ns]、datetime64[ns, tz]、timedelta64、period）的数据类型时，Series.view() 和 Index.view() 中的错误 (GH 39788)
从空的 np.recarray 创建 DataFrame 时，原始数据类型未保留的错误 (GH 40121)
DataFrame 中的一个错误，在从 frozenset 构造时未能引发 TypeError (GH 40163)
在 Index 构建中存在一个错误，当数据无法转换为该数据类型时，会静默忽略传递的 dtype (GH 21311)
在 StringArray.astype() 中的错误，回退到 NumPy 并在转换为 dtype='categorical' 时引发 (GH 40450)
在 factorize() 中的一个错误，当给定一个数值类型的 NumPy dtype 低于 int64、uint64 和 float64 的数组时，唯一值没有保留其原始 dtype (GH 41132)
在包含带有 ExtensionDtype 的类数组对象和 copy=True 的字典构造 DataFrame 时出现的错误，未能进行复制 (GH 38939)
在将 Float64DType 作为输入时，qcut() 中的错误引发错误 (GH 40730)
在 DataFrame 和 Series 构造中存在一个错误，当使用 datetime64[ns] 数据和 dtype=object 时，结果是 datetime 对象而不是 Timestamp 对象 (GH 41599)
DataFrame 和 Series 在构造 timedelta64[ns] 数据和 dtype=object 时存在错误，导致生成 np.timedelta64 对象而不是 Timedelta 对象 (GH 41599)
在给定一个二维对象类型的 np.ndarray 包含 Period 或 Interval 对象时，构造 DataFrame 的错误，无法分别转换为 PeriodDtype 或 IntervalDtype (GH 41812)
从列表和 PandasDtype 构建 Series 时出现的错误 (GH 39357)
从 range 对象创建 Series 时，超出 int64 数据类型范围的错误 (GH 30173)
在从所有元组键的 dict 创建 Series 时出现错误，并且需要重新索引的 Index (GH 41707)
在 infer_dtype() 中未识别带有 Period 数据类型的 Series、Index 或数组的错误 (GH 23553)
infer_dtype() 中对一般 ExtensionArray 对象引发错误的错误。现在它将返回 "unknown-array" 而不是引发 (GH 37367)。
DataFrame.convert_dtypes() 中的一个错误在调用空 DataFrame 时错误地引发了一个 ValueError (GH 40393)

字符串#

当原始数据没有块时，从 pyarrow.ChunkedArray 转换为 StringArray 的错误 (GH 41040)
Series.replace() 和 DataFrame.replace() 中的错误，忽略 StringDType 数据的 regex=True 替换 (GH 41333, GH 35977)
Series.str.extract() 中存在一个错误，当使用 StringArray 返回一个空 DataFrame 时，返回的是对象数据类型 (GH 41441)
在 Series.str.replace() 中的一个错误，当 regex=False 时忽略了 case 参数 (GH 41602)

Interval#

IntervalIndex.intersection() 和 IntervalIndex.symmetric_difference() 中的 Bug，在与 CategoricalIndex 操作时总是返回 object-dtype (GH 38653, GH 38741)
IntervalIndex.intersection() 中的错误在至少一个 Index 对象有重复项且在另一个对象中也存在时返回重复项 (GH 38743)
IntervalIndex.union(), IntervalIndex.intersection(), IntervalIndex.difference(), 和 IntervalIndex.symmetric_difference() 现在会转换为适当的 dtype，而不是在与另一个具有不兼容 dtype 的 IntervalIndex 操作时引发 TypeError (GH 39267)
PeriodIndex.union(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference(), PeriodIndex.difference() 现在在操作另一个具有不兼容dtype的 PeriodIndex 时会转换为object dtype，而不是引发 IncompatibleFrequency (GH 39306)
当存在 NA 值时，IntervalIndex.is_monotonic()、IntervalIndex.get_loc()、IntervalIndex.get_indexer_for() 和 IntervalIndex.__contains__() 中的错误 (GH 41831)

索引#

在 Index 不是单调的或 sort 设置为 False 时，Index.union() 和 MultiIndex.union() 中的错误会删除重复的 Index 值 (GH 36289, GH 31326, GH 40862)
在 CategoricalIndex.get_indexer() 中的错误，未能当非唯一时引发 InvalidIndexError (GH 38372)
当 target 具有 CategoricalDtype 且索引和目标都包含 NA 值时，IntervalIndex.get_indexer() 中的 Bug (GH 41934)
当输入通过布尔列表过滤并且要设置的值是维度较低的列表时，Series.loc() 中的错误引发了一个 ValueError (GH 20438)
在向 DataFrame 插入许多新列时出现的错误，导致后续索引行为不正确 (GH 38380)
在 DataFrame.__setitem__() 中设置多个值到重复列时引发 ValueError 的错误 (GH 15695)
在 DataFrame.loc()、Series.loc()、DataFrame.__getitem__() 和 Series.__getitem__() 中存在一个错误，对于非单调的 DatetimeIndex 字符串切片返回不正确的元素 (GH 33146)
在 DataFrame.reindex() 和 Series.reindex() 中，带有时区感知的索引在指定 method="ffill" 和 method="bfill" 以及 tolerance 时引发 TypeError 的错误 (GH 38566)
在 DataFrame.reindex() 中使用 datetime64[ns] 或 timedelta64[ns] 时，当 fill_value 需要转换为对象类型时，错误地转换为整数 (GH 39755)
在通过指定列和一个非空的 DataFrame 值设置一个空的 DataFrame 时，DataFrame.__setitem__() 中出现的错误会引发一个 ValueError (GH 38831)
在 DataFrame 有重复列时，DataFrame.loc.__setitem__() 中的错误在操作唯一列时引发 ValueError (GH 38521)
在设置字典值时，DataFrame.iloc.__setitem__() 和 DataFrame.loc.__setitem__() 中混合数据类型的错误 (GH 38335)
在 Series.loc.__setitem__() 和 DataFrame.loc.__setitem__() 中存在一个错误，当提供一个布尔生成器时会引发 KeyError (GH 39614)
Series.iloc() 和 DataFrame.iloc() 中的错误，当提供生成器时引发 KeyError (GH 39614)
DataFrame.__setitem__() 中的一个错误，当右侧是一个具有错误列数的 DataFrame 时，不会引发 ValueError (GH 38604)
在 Series.__setitem__() 中存在一个错误，当使用标量索引器设置 Series 时会引发 ValueError (GH 38303)
在 DataFrame.loc() 中存在一个错误，当用作输入的 DataFrame 只有一行时，会删除 MultiIndex 的级别 (GH 10521)
DataFrame.__getitem__() 和 Series.__getitem__() 中的错误总是会在使用现有字符串进行切片时引发 KeyError，其中 Index 具有毫秒 (GH 33589)
在将 timedelta64 或 datetime64 值设置到数字 Series 时，无法转换为对象 dtype 的错误 (GH 39086, GH 39619)
在将 Interval 值设置到 Series 或 DataFrame 中时，由于 IntervalDtype 不匹配，错误地将新值转换为现有 dtype 的错误 (GH 39120)
在将 datetime64 值设置到具有整数数据类型的 Series 时，错误地将 datetime64 值转换为整数 (GH 39266)
在将 np.datetime64("NaT") 设置到具有 Datetime64TZDtype 的 Series 中时，错误地将时区无知的值视为时区感知 (GH 39769)
在 Index.get_loc() 中存在一个错误，当 key=NaN 且指定了 method 但 Index 中不存在 NaN 时，未引发 KeyError (GH 39382)
在 DatetimeIndex.insert() 中插入 np.datetime64("NaT") 到时区感知的索引时，错误地将无时区值视为有时区值 (GH 39769)
在 Index.insert() 中错误地引发问题，当设置一个新列时，该列无法在现有的 frame.columns 中保存，或在 Series.reset_index() 或 DataFrame.reset_index() 中，而不是转换为兼容的 dtype (GH 39068)
在 RangeIndex.append() 中的一个错误，其中长度为1的单个对象被错误地连接 (GH 39401)
在 RangeIndex.astype() 中的错误，当转换为 CategoricalIndex 时，类别变成了 Int64Index 而不是 RangeIndex (GH 41263)
在对象类型的 Series 中使用布尔索引器设置 numpy.timedelta64 值时存在错误 (GH 39488)
在将数值设置到使用 at 或 iat 的布尔型 Series 时，未能转换为对象型 dtype 的错误 (GH 39582)
在尝试使用行切片索引并设置列表作为值时，DataFrame.__setitem__() 和 DataFrame.iloc.__setitem__() 中的错误引发 ValueError (GH 40440)
在 MultiIndex 中，当键未找到且级别未完全指定时，DataFrame.loc() 中的错误不会引发 KeyError (GH 41170)
当扩展轴中的索引包含重复项时，在 DataFrame.loc.__setitem__() 中设置-带扩展错误地引发错误 (GH 40096)
在 DataFrame.loc.__getitem__() 中存在一个错误，当至少有一个索引列具有浮点数据类型并且我们检索一个标量时，MultiIndex 会转换为浮点数 (GH 41369)
在 DataFrame.loc() 中的错误不正确地匹配非布尔索引元素 (GH 20432)
在使用 np.nan 对带有 CategoricalIndex 的 Series 或 DataFrame 进行索引时，当存在 np.nan 键时，错误地引发 KeyError 的问题 (GH 41933)
在 ExtensionDtype 存在的情况下，Series.__delitem__() 中的错误导致不正确地转换为 ndarray (GH 40386)
在使用 CategoricalIndex 时，DataFrame.at() 中的错误在传递整数键时返回不正确的结果 (GH 41846)
如果索引器有重复项，DataFrame.loc() 中的错误会以错误的顺序返回 MultiIndex (GH 40978)
在 DataFrame.__setitem__() 中使用 str 子类作为列名与 DatetimeIndex 时引发 TypeError 的错误 (GH 37366)
在 PeriodIndex.get_loc() 中的错误，当给定一个频率不匹配的 Period 时未能引发 KeyError (GH 41670)
在某些情况下，使用 UInt64Index 和负整数键时，Bug .loc.__getitem__ 会引发 OverflowError 而不是 KeyError，在其他情况下会环绕到正整数 (GH 41777)
在某些情况下，Index.get_indexer() 中的错误未能对无效的 method、limit 或 tolerance 参数引发 ValueError (GH 41918)
当使用 TimedeltaIndex 对 Series 或 DataFrame 进行切片时，传递无效字符串会引发 ValueError 而不是 TypeError 的错误 (GH 41821)
在 Index 构造函数中的错误有时会静默忽略指定的 dtype (GH 38879)
Index.where() 行为现在镜像 Index.putmask() 行为，即 index.where(mask, other) 匹配 index.putmask(~mask, other) (GH 39412)

缺失#

Grouper 中的错误没有正确传播 dropna 参数；DataFrameGroupBy.transform() 现在正确处理 dropna=True 的缺失值 (GH 35612)
Bug in isna(), Series.isna(), Index.isna(), DataFrame.isna(), and the corresponding notna functions not recognizing Decimal("NaN") objects (GH 39409)
在 DataFrame.fillna() 中的错误，不接受字典作为 downcast 关键字 (GH 40809)
isna() 中的一个错误，对于可空类型没有返回掩码的副本，导致任何后续的掩码修改都会改变原始数组 (GH 40935)
DataFrame 构造中的错误，包含 NaN 的浮点数据和整数 dtype 转换，而不是保留 NaN (GH 26919)
Series.isin() 和 MultiIndex.isin() 中的错误在元组中没有将所有 NaN 视为等价 (GH 41836)

MultiIndex#

在 DataFrame.drop() 中存在一个错误，当 MultiIndex 不是唯一的且未提供 level 时会引发 TypeError (GH 36293)
在 MultiIndex.intersection() 中的错误导致结果中 NaN 重复 (GH 38623)
MultiIndex.equals() 中的一个错误，在 MultiIndex 包含 NaN 时错误地返回 True，即使它们的顺序不同 (GH 38439)
在 MultiIndex.intersection() 中的错误，在与 CategoricalIndex 相交时总是返回一个空结果 (GH 38653)
在 MultiIndex.difference() 中的错误在索引包含不可排序条目时错误地引发 TypeError (GH 41915)
在空 MultiIndex 上使用 MultiIndex.reindex() 时引发 ValueError 的错误，并且仅索引特定级别 (GH 41170)
在 MultiIndex.reindex() 中重新索引时，当对一个扁平的 Index 进行重新索引时引发 TypeError 的错误 (GH 41707)

I/O#

当 display.max_seq_items=1 时，Index.__repr__() 中的错误 (GH 38415)
read_csv() 中的错误，如果设置了参数 decimal 并且 engine="python"，则无法识别科学计数法 (GH 31920)
在 NA 包含注释字符串时，read_csv() 将 NA 值解释为注释的错误，在 engine="python" 中已修复 (GH 34002)
在 read_csv() 中存在一个错误，当文件没有数据行且指定了多个标题列和 index_col 时会引发 IndexError (GH 38292)
read_csv() 中的错误，当 engine="python" 时，不接受 usecols 的长度与 names 不同 (GH 16469)
Bug in read_csv() returning object dtype when delimiter="," with usecols and parse_dates specified for engine="python" (GH 35873) 的中文翻译为：
Bug in read_csv() raising a TypeError when names and parse_dates is specified for engine="c" (GH 33699)
在 WSL 中 read_clipboard() 和 DataFrame.to_clipboard() 的错误 (GH 38527)
允许为 read_sql()、read_sql_query() 和 read_sql_table() 的 parse_dates 参数自定义错误值 (GH 35185)
在尝试对 DataFrame 或 Series 的子类应用时，DataFrame.to_hdf() 和 Series.to_hdf() 中的错误引发了一个 KeyError (GH 33748)
HDFStore.put() 中的错误在保存具有非字符串 dtype 的 DataFrame 时引发错误的 TypeError (GH 34274)
json_normalize() 中的一个错误导致生成器对象的第一个元素未包含在返回的 DataFrame 中 (GH 35923)
当列应该被解析为日期并且为 engine="python" 指定了 usecols 时，read_csv() 中的错误将千位分隔符应用于日期列 (GH 39365)
在指定多个表头和索引列时，read_excel() 中的错误导致 MultiIndex 名称前向填充 (GH 34673)
在 read_excel() 中的错误不尊重 set_option() (GH 34252)
在 read_csv() 中的错误，未为可空布尔类型切换 true_values 和 false_values (GH 34655)
当 orient="split" 时，read_json() 中的错误未保持数字字符串索引 (GH 28556)
read_sql() 如果 chunksize 非零且查询没有返回结果，则返回一个空的生成器。现在返回一个包含单个空DataFrame的生成器 (GH 34411)
在 read_hdf() 中存在一个错误，当使用 where 参数对分类字符串列进行过滤时，返回了意外的记录 (GH 39189)
在 datetimes 为空时，read_sas() 中的错误引发了一个 ValueError (GH 39725)
在 read_excel() 中，单列电子表格中的空值被删除的错误 (GH 39808)
在某些文件类型中，read_excel() 加载尾部空行/列的错误 (GH 41167)
当Excel文件有一个 MultiIndex 标题，紧接着两行空行且没有索引时，read_excel() 引发了一个 AttributeError (GH 40442)
在 read_excel(), read_csv(), read_table(), read_fwf(), 和 read_clipboard() 中的一个错误，即在 MultiIndex 标题后没有索引的一个空白行会被删除 (GH 40442)
在 index=False 时，DataFrame.to_string() 中的错误导致截断列位置错误 (GH 40904)
DataFrame.to_string() 中的一个错误，当 index=False 时，会在截断行中添加一个额外的点并导致对齐错误 (GH 40904)
read_orc() 中的错误总是引发 AttributeError (GH 40918)
read_csv() 和 read_table() 中的错误，如果定义了 names 和 prefix，会静默忽略 prefix，现在会引发 ValueError (GH 39123)
在 mangle_dupe_cols 设置为 True 时，read_csv() 和 read_excel() 中的错误不尊重重复列名的 dtype (GH 35211)
如果 delimiter 和 sep 都被定义，read_csv() 中的错误会静默忽略 sep，现在会引发 ValueError (GH 39823)
在 read_csv() 和 read_table() 中存在一个错误，当 sys.setprofile 之前被调用时，会错误解释参数 (GH 41069)
在从 PyArrow 转换为 pandas（例如读取 Parquet 文件）时存在一个错误，涉及可空 dtypes 和数据缓冲区大小不是 dtype 大小倍数的 PyArrow 数组 (GH 40896)
read_excel() 中的错误会在 pandas 无法确定文件类型时引发错误，即使用户指定了 engine 参数 (GH 41225)
如果第一列中有空值，从excel文件复制时 read_clipboard() 中的错误会将值移到错误的列中 (GH 41108)
在尝试将字符串列附加到不兼容的列时，DataFrame.to_hdf() 和 Series.to_hdf() 中的错误引发了一个 TypeError (GH 41897)

周期#

比较 Period 对象或 Index, Series, 或 DataFrame 时，如果 PeriodDtype 不匹配，现在会像其他类型不匹配的比较一样，对于相等返回 False，对于不相等返回 True，并且在进行不等检查时会引发 TypeError (GH 39274)

绘图#

在传递二维 ax 参数时，plotting.scatter_matrix() 中的错误引发 (GH 16253)
当启用 Matplotlib 的 constrained_layout 时防止警告 (GH 25261)
在 DataFrame.plot() 中的错误在图例中显示了错误的颜色，如果函数被重复调用，并且有些调用使用了 yerr 而其他调用没有使用 (GH 39522)
如果函数被重复调用，并且某些调用使用 secondary_y 而其他调用使用 legend=False，则 DataFrame.plot() 中的错误会在图例中显示错误的颜色 (GH 40044)
当选择 dark_background 主题时，DataFrame.plot.box() 中的错误，图形的端盖或最小/最大标记不可见 (GH 40769)

分组/重采样/滚动#

DataFrameGroupBy.agg() 和 SeriesGroupBy.agg() 中存在一个错误，当使用 PeriodDtype 列时，结果的类型转换过于激进 (GH 38254)
在 SeriesGroupBy.value_counts() 中的一个错误，其中分组的分类系列中的未观察类别未被计数 (GH 38672)
在 SeriesGroupBy.value_counts() 中的错误，当处理空 Series 时会引发错误 (GH 39172)
当分组键中存在空值时，GroupBy.indices() 中的错误将包含不存在的索引 (GH 9304)
修复了 DataFrameGroupBy.sum() 和 SeriesGroupBy.sum() 中的错误，通过现在使用 Kahan 求和来避免精度丢失 (GH 38778)
修复了 DataFrameGroupBy.cumsum()、SeriesGroupBy.cumsum()、DataFrameGroupBy.mean() 和 SeriesGroupBy.mean() 中的错误，通过使用 Kahan 求和导致精度丢失的问题 (GH 38934)
在缺少键具有混合数据类型时，Resampler.aggregate() 和 DataFrame.transform() 中存在一个错误，会引发 TypeError 而不是 SpecificationError (GH 39025)
在带有 ExtensionDtype 列的 DataFrameGroupBy.idxmin() 和 DataFrameGroupBy.idxmax() 中的错误 (GH 38733)
在 Series.resample() 中的错误会在索引是由 NaT 组成的 PeriodIndex 时引发 (GH 39227)
在 RollingGroupby.corr() 和 ExpandingGroupby.corr() 中的错误，当提供的 other 比每个组更长时，分组列会返回 0 而不是 np.nan (GH 39591)
ExpandingGroupby.corr() 和 ExpandingGroupby.cov() 中的一个错误，当提供的 other 比每个组更长时，会返回 1 而不是 np.nan (GH 39591)
在 DataFrameGroupBy.mean()、SeriesGroupBy.mean()、DataFrameGroupBy.median()、SeriesGroupBy.median() 和 DataFrame.pivot_table() 中存在一个错误，未传播元数据 (GH 28283)
在 Series.rolling() 和 DataFrame.rolling() 中的错误，当窗口是一个偏移量且日期是降序时，未能正确计算窗口边界 (GH 40002)
在空的 Series 或 DataFrame 上使用 Series.groupby() 和 DataFrame.groupby() 时出现的错误会导致在使用方法 idxmax, idxmin, mad, min, max, sum, prod, 和 skew 或通过 apply, aggregate, 或 resample 使用它们时丢失索引、列和/或数据类型 (GH 26411)
在 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 中的一个错误，当在 RollingGroupby 对象上使用时，会创建一个 MultiIndex 而不是 Index (GH 39732)
在 DataFrameGroupBy.sample() 中的一个错误，当指定 weights 并且索引是 Int64Index 时会引发错误 (GH 39927)
DataFrameGroupBy.aggregate() 和 Resampler.aggregate() 中的错误有时会在传递字典且缺少列时引发 SpecificationError；现在将始终引发 KeyError 代替 (GH 40004)
在 DataFrameGroupBy.sample() 中的错误，其中在计算结果之前没有应用列选择 (GH 39928)
在调用 __getitem__ 时，ExponentialMovingWindow 中的一个错误会在提供 times 时错误地引发 ValueError (GH 40164)
在调用 __getitem__ 时，ExponentialMovingWindow 中的错误不会保留 com、span、alpha 或 halflife 属性 (GH 40164)
ExponentialMovingWindow 现在在指定 times 且 adjust=False 时会引发 NotImplementedError，因为计算不正确 (GH 40098)
在 ExponentialMovingWindowGroupby.mean() 中的错误，当 engine='numba' 时忽略了 times 参数 (GH 40951)
在 ExponentialMovingWindowGroupby.mean() 中的错误，在多个组的情况下使用了错误的时间 (GH 40951)
ExponentialMovingWindowGroupby 中的一个错误，其中时间向量和值在非平凡组中不同步 (GH 40951)
在索引未排序时，Series.asfreq() 和 DataFrame.asfreq() 中的错误导致行丢失 (GH 39805)
在 DataFrame 的聚合函数中，当给出 level 关键字时，不尊重 numeric_only 参数的错误 (GH 40660)
在 SeriesGroupBy.aggregate() 中的一个错误，使用用户定义的函数来聚合一个带有对象类型 Index 的 Series 会导致不正确的 Index 形状 (GH 40014)
RollingGroupby 中的一个错误，其中 groupby 中的 as_index=False 参数被忽略 (GH 39433)
在 DataFrameGroupBy.any()、SeriesGroupBy.any()、DataFrameGroupBy.all() 和 SeriesGroupBy.all() 中存在一个错误，当与持有 NA 的可空类型列一起使用时，即使设置了 skipna=True，也会引发 ValueError (GH 40585)
在 DataFrameGroupBy.cummin(), SeriesGroupBy.cummin(), DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 中的错误导致在 int64 实现边界附近的整数值被错误地四舍五入 (GH 40767)
在可空 dtypes 的情况下，DataFrameGroupBy.rank() 和 SeriesGroupBy.rank() 中的错误不正确地引发 TypeError (GH 41010)
在 DataFrameGroupBy.cummin()、SeriesGroupBy.cummin()、DataFrameGroupBy.cummax() 和 SeriesGroupBy.cummax() 中存在错误，当使用可空数据类型且数值过大无法在转换为浮点数时正确计算 (GH 37493)
在 DataFrame.rolling() 中存在一个错误，如果计算不稳定，当 min_periods=0 时，所有 NaN 窗口的均值返回为零 (GH 41053)
在 DataFrame.rolling() 中存在一个错误，当计算不是数值稳定时，对于所有 NaN 窗口且 min_periods=0 返回的和不为零 (GH 41053)
SeriesGroupBy.agg() 中的错误，在保留顺序的聚合上未能保留有序的 CategoricalDtype (GH 41147)
在 DataFrameGroupBy.min(), SeriesGroupBy.min(), DataFrameGroupBy.max() 和 SeriesGroupBy.max() 中存在一个错误，当有多个对象类型的列且 numeric_only=False 时，会错误地引发 ValueError (GH 41111)
在 DataFrameGroupBy.rank() 中存在一个错误，当 GroupBy 对象的 axis=0 和 rank 方法的关键字 axis=1 时 (GH 41320)
在非唯一列的情况下，DataFrameGroupBy.__getitem__() 中的错误错误地返回了一个格式错误的 SeriesGroupBy 而不是 DataFrameGroupBy (GH 41427)
在非唯一列中使用 DataFrameGroupBy.transform() 时错误地引发 AttributeError 的 Bug (GH 41427)
在 Resampler.apply() 中存在一个错误，非唯一列不正确地删除重复列 (GH 41445)
Series.groupby() 聚合中的一个错误，错误地返回空的 Series 而不是在对其 dtype 无效的聚合上引发 TypeError，例如 datetime64[ns] dtype 的 .prod (GH 41342)
DataFrameGroupBy 聚合中的错误，在没有任何有效列时，未能正确删除对该聚合无效的数据类型的列 (GH 41291)
在 DataFrame.rolling.__iter__() 中的错误，其中 on 未分配给结果对象的索引 (GH 40373)
在 DataFrameGroupBy.transform() 和 DataFrameGroupBy.agg() 中使用 engine="numba" 时出现的错误，其中 *args 被用户传递的函数缓存 (GH 41647)
DataFrameGroupBy 方法 agg、transform、sum、bfill、ffill、pad、pct_change、shift、ohlc 中存在一个错误，导致 .columns.names 丢失 (GH 41497)

重塑#

在执行带有部分索引和 right_index=True 的内部连接时，当索引之间没有重叠时，merge() 中的错误 (GH 33814)
在缺少层级的情况下，DataFrame.unstack() 中的错误导致索引名称不正确 (GH 37510)
在 merge_asof() 中存在一个错误，当 left_index=True 和 right_on 指定时，传播的是右索引而不是左索引 (GH 33463)
在具有 MultiIndex 的 DataFrame 上使用 DataFrame.join() 时，当两个索引之一只有一个层级时，返回了错误的结果 (GH 36909)
merge_asof() 现在在非数值合并列的情况下会引发 ValueError 而不是隐晦的 TypeError (GH 29130)
DataFrame.join() 中的错误在 DataFrame 具有 MultiIndex 时未能正确赋值，其中至少一个维度的 dtype 为 Categorical 且类别未按字母顺序排序 (GH 38502)
Series.value_counts() 和 Series.mode() 现在以原始顺序返回一致的键 (GH 12679, GH 11227 和 GH 39007)
DataFrame.stack() 中的错误未正确处理 MultiIndex 列中的 NaN (GH 39481)
在 DataFrame.apply() 中的错误会在参数 func 是字符串、axis=1 且不支持轴参数时给出不正确的结果；现在改为引发 ValueError 错误 (GH 39211)
在 ignore_index=True 时，DataFrame.sort_values() 中的错误导致在按列排序后索引未正确重塑 (GH 39464)
在 DataFrame.append() 中存在一个错误，当结合 ExtensionDtype dtypes 时返回不正确的 dtypes (GH 39454)
在 DataFrame.append() 中存在一个错误，当与 datetime64 和 timedelta64 数据类型组合使用时，返回不正确的数据类型 (GH 39574)
在具有 MultiIndex 的 DataFrame 中使用 DataFrame.append() 方法时，如果追加的 Series 的 Index 不是 MultiIndex，则会出现错误 (GH 41707)
在处理空DataFrame时，DataFrame.pivot_table() 返回单个值的 MultiIndex 存在错误 (GH 13483)
索引 现在可以传递给 numpy.all() 函数 (GH 40180)
DataFrame.stack() 中的错误未在 MultiIndex 中保留 CategoricalDtype (GH 36991)
当输入序列包含不可哈希项时，to_datetime() 中的错误 (GH 39756)
在 ignore_index 为 True 且值为标量时，Series.explode() 中的错误保留了索引 (GH 40487)
当 Series 包含 None 和 NaT 并且元素数量超过 50 时，to_datetime() 引发 ValueError 的错误 (GH 39882)
在包含时区感知日期时间对象的对象类型值中，Series.unstack() 和 DataFrame.unstack() 中的错误不正确地引发 TypeError (GH 41875)
当 DataFrame 有重复列用作 value_vars 时，DataFrame.melt() 引发 InvalidIndexError 的错误 (GH 41951)

Sparse#

在 DataFrame.sparse.to_coo() 中存在一个错误，当列是一个没有 0 的数值 Index 时会引发 KeyError (GH 18414)
当从整数类型转换为浮点类型时，SparseArray.astype() 在 copy=False 情况下产生不正确的结果 (GH 34456)
SparseArray.max() 和 SparseArray.min() 中的错误总是返回一个空结果 (GH 40921)

ExtensionArray#

当 other 是一个带有 ExtensionDtype 的 Series 时，DataFrame.where() 中的 Bug (GH 38729)
修复了当底层数据是 ExtensionArray 时，Series.idxmax(), Series.idxmin(), Series.argmax(), 和 Series.argmin() 会失败的问题 (GH 32749, GH 33719, GH 36566)
修复了 PandasExtensionDtype 子类的一些属性被不当缓存的错误 (GH 40329)
在 DataFrame.mask() 中的错误，其中用 ExtensionDtype 掩码 DataFrame 会引发 ValueError (GH 40941)

Styler#

Styler 中的一个错误，其中方法中的 subset 参数对某些有效的 MultiIndex 切片引发了错误 (GH 33562)
Styler 渲染的 HTML 输出已经进行了一些小的调整，以支持 w3 的良好代码标准 (GH 39626)
在 Styler 中的一个错误，其中某些标题单元格缺少列类标识符，导致渲染的 HTML 缺失 (GH 39716)
在 Styler.background_gradient() 中的一个错误，其中文本颜色未正确确定 (GH 39888)
在 Styler.set_table_styles() 中的一个错误，其中 table_styles 参数的 CSS-选择器中的多个元素未正确添加 (GH 34061)
在 Styler 中的一个错误，从 Jupyter 复制时会丢失左上角的单元格并使标题错位 (GH 12147)
在 Styler.where 中的一个错误，其中 kwargs 没有传递给适用的可调用对象 (GH 40845)
Styler 中的一个错误导致 CSS 在多次渲染时重复 (GH 39395, GH 40334)

其他#

inspect.getmembers(Series) 不再引发 AbstractMethodError (GH 38782)
在具有数值类型和 other=None 的 Series.where() 中存在一个错误，不会转换为 nan (GH 39761)
assert_series_equal(), assert_frame_equal(), assert_index_equal() 和 assert_extension_array_equal() 中的一个错误在属性具有未识别的 NA 类型时错误地引发 (GH 39461)
在 exact=True 的情况下，assert_index_equal() 中的错误不会在比较 CategoricalIndex 实例与 Int64Index 和 RangeIndex 类别时引发 (GH 41263)
在包含 np.datetime64("NaT") 或 np.timedelta64("NaT") 的 object-dtype 中，DataFrame.equals()、Series.equals() 和 Index.equals() 的 Bug (GH 39650)
在 show_versions() 中的错误，控制台 JSON 输出不是正确的 JSON (GH 39701)
当使用 xlc 时，pandas 现在可以在 z/OS 上编译 (GH 35826)
pandas.util.hash_pandas_object() 中的一个错误，当输入对象类型是 DataFrame 时，无法识别 hash_key、encoding 和 categorize (GH 41404)

贡献者#

总共有251人为此版本贡献了补丁。名字旁边有“+”的人首次贡献了补丁。

Abhishek R +
Ada Draginda
Adam J. Stewart
Adam Turner +
Aidan Feldman +
Ajitesh Singh +
Akshat Jain +
Albert Villanova del Moral
Alexandre Prince-Levasseur +
Andrew Hawyrluk +
Andrew Wieteska
AnglinaBhambra +
Ankush Dua +
Anna Daglis
Ashlan Parker +
Ashwani +
Avinash Pancham
Ayushman Kumar +
BeanNan
Benoît Vinot
Bharat Raghunathan
Bijay Regmi +
Bobin Mathew +
Bogdan Pilyavets +
Brian Hulette +
Brian Sun +
Brock +
Bryan Cutler
Caleb +
Calvin Ho +
Chathura Widanage +
Chinmay Rane +
Chris Lynch
Chris Withers
Christos Petropoulos
Corentin Girard +
DaPy15 +
Damodara Puddu +
Daniel Hrisca
Daniel Saxton
DanielFEvans
Dare Adewumi +
Dave Willmer
David Schlachter +
David-dmh +
Deepang Raval +
Doris Lee +
Dr. Jan-Philip Gehrcke +
DriesS +
Dylan Percy
Erfan Nariman
Eric Leung
EricLeer +
Eve
Fangchen Li
Felix Divo
Florian Jetter
Fred Reiss
GFJ138 +
Gaurav Sheni +
Geoffrey B. Eisenbarth +
Gesa Stupperich +
Griffin Ansel +
Gustavo C. Maciel +
Heidi +
Henry +
Hung-Yi Wu +
Ian Ozsvald +
Irv Lustig
Isaac Chung +
Isaac Virshup
JHM Darbyshire (MBP) +
JHM Darbyshire (iMac) +
Jack Liu +
James Lamb +
Jeet Parekh
Jeff Reback
Jiezheng2018 +
Jody Klymak
Johan Kåhrström +
John McGuigan
Joris Van den Bossche
Jose
JoseNavy
Josh Dimarsky
Josh Friedlander
Joshua Klein +
Julia Signell
Julian Schnitzler +
Kaiqi Dong
Kasim Panjri +
Katie Smith +
Kelly +
Kenil +
Keppler, Kyle +
Kevin Sheppard
Khor Chean Wei +
Kiley Hewitt +
Larry Wong +
Lightyears +
Lucas Holtz +
Lucas Rodés-Guirao
Lucky Sivagurunathan +
Luis Pinto
Maciej Kos +
Marc Garcia
Marco Edward Gorelli +
Marco Gorelli
MarcoGorelli +
Mark Graham
Martin Dengler +
Martin Grigorov +
Marty Rudolf +
Matt Roeschke
Matthew Roeschke
Matthew Zeitlin
Max Bolingbroke
Maxim Ivanov
Maxim Kupfer +
Mayur +
MeeseeksMachine
Micael Jarniac
Michael Hsieh +
Michel de Ruiter +
Mike Roberts +
Miroslav Šedivý
Mohammad Jafar Mashhadi
Morisa Manzella +
Mortada Mehyar
Muktan +
Naveen Agrawal +
Noah
Nofar Mishraki +
Oleh Kozynets
Olga Matoula +
Oli +
Omar Afifi
Omer Ozarslan +
Owen Lamont +
Ozan Öğreden +
Pandas Development Team
Paolo Lammens
Parfait Gasana +
Patrick Hoefler
Paul McCarthy +
Paulo S. Costa +
Pav A
Peter
Pradyumna Rahul +
Punitvara +
QP Hou +
Rahul Chauhan
Rahul Sathanapalli
Richard Shadrach
Robert Bradshaw
Robin to Roxel
Rohit Gupta
Sam Purkis +
Samuel GIFFARD +
Sean M. Law +
Shahar Naveh +
ShaharNaveh +
Shiv Gupta +
Shrey Dixit +
Shudong Yang +
Simon Boehm +
Simon Hawkins
Sioned Baker +
Stefan Mejlgaard +
Steven Pitman +
Steven Schaerer +
Stéphane Guillou +
TLouf +
Tegar D Pratama +
Terji Petersen
Theodoros Nikolaou +
Thomas Dickson
Thomas Li
Thomas Smith
Thomas Yu +
ThomasBlauthQC +
Tim Hoffmann
Tom Augspurger
Torsten Wörtwein
Tyler Reddy
UrielMaD
Uwe L. Korn
Venaturum +
VirosaLi
Vladimir Podolskiy
Vyom Pathak +
WANG Aiyong
Waltteri Koskinen +
Wenjun Si +
William Ayd
Yeshwanth N +
Yuanhao Geng
Zito Relova +
aflah02 +
arredond +
attack68
cdknox +
chinggg +
fathomer +
ftrihardjo +
github-actions[bot] +
gunjan-solanki +
guru kiran
hasan-yaman
i-aki-y +
jbrockmendel
jmholzer +
jordi-crespo +
jotasi +
jreback
juliansmidek +
kylekeppler
lrepiton +
lucasrodes
maroth96 +
mikeronayne +
mlondschien
moink +
morrme
mschmookler +
mzeitlin11
na2 +
nofarmishraki +
partev
patrick
ptype
realead
rhshadrach
rlukevie +
rosagold +
saucoide +
sdementen +
shawnbrown
sstiijn +
stphnlyd +
sukriti1 +
taytzehao
theOehrly +
theodorju +
thordisstella +
tonyyyyip +
tsinggggg +
tushushu +
vangorade +
vladu +
wertha +

1.3.0 中的新功能 (2021年7月2日)#

增强功能#

读取csv或json文件时的自定义HTTP(s)头#

读取和写入 XML 文档#

样式增强#

DataFrame 构造函数在处理 copy=False 时尊重字典#

PyArrow 支持的字符串数据类型#

居中的类似日期时间的滚动窗口#

其他增强功能#

值得注意的错误修复#

Categorical.unique 现在总是保持与原始数据相同的 dtype#

在 DataFrame.combine_first() 中保留数据类型#

Groupby 方法 agg 和 transform 不再更改可调用对象的返回数据类型#

float 结果用于 DataFrameGroupBy.mean()、DataFrameGroupBy.median() 和 GDataFrameGroupBy.var()、SeriesGroupBy.mean()、SeriesGroupBy.median() 和 SeriesGroupBy.var()#

在使用 loc 和 iloc 设置值时，尝试就地操作#

在设置 frame[keys] = values 时，切勿进行就地操作#

在设置为布尔序列时进行一致的类型转换#

DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 不再在值中返回分组列#

在滚动方差和标准差中移除了人工截断#

具有 MultiIndex 的 DataFrameGroupBy.rolling 和 SeriesGroupBy.rolling 在结果中不再删除级别#

向后不兼容的 API 变化#

增加了依赖项的最小版本#

其他 API 更改#

构建#

弃用#

在 DataFrame 缩减和 DataFrameGroupBy 操作中弃用丢弃烦扰列#

其他弃用#

性能提升#

错误修复#

Categorical#

Datetimelike#

Timedelta#

时区#

Numeric#

转换#

字符串#

Interval#

索引#

缺失#

MultiIndex#

I/O#

周期#

绘图#

分组/重采样/滚动#

重塑#

Sparse#

ExtensionArray#

Styler#

其他#

贡献者#

DataFrame 构造函数在处理 `copy=False` 时尊重字典#

`Categorical.unique` 现在总是保持与原始数据相同的 dtype#

在 `DataFrame.combine_first()` 中保留数据类型#

`float` 结果用于 `DataFrameGroupBy.mean()`、`DataFrameGroupBy.median()` 和 `GDataFrameGroupBy.var()`、`SeriesGroupBy.mean()`、`SeriesGroupBy.median()` 和 `SeriesGroupBy.var()`#

在使用 `loc` 和 `iloc` 设置值时，尝试就地操作#

在设置 `frame[keys] = values` 时，切勿进行就地操作#