pandas.DataFrame#

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)[源代码]#

二维、可变大小、潜在异构的表格数据。

数据结构还包含带标签的轴（行和列）。算术运算在行和列标签上对齐。可以被认为是用于 Series 对象的类字典容器。主要的 pandas 数据结构。

参数:

数据ndarray（结构化或同质化）、可迭代对象、字典或DataFrame

Dict 可以包含 Series、数组、常量、数据类或类列表对象。如果数据是一个字典，列顺序遵循插入顺序。如果字典包含定义了索引的 Series，则按其索引对齐。如果数据本身是 Series 或 DataFrame，也会发生这种对齐。对齐是在 Series/DataFrame 输入上进行的。

如果数据是一个字典列表，列的顺序遵循插入顺序。

索引索引或类数组

用于结果帧的索引。如果输入数据中没有索引信息且未提供索引，则默认为 RangeIndex。

列索引或类数组

当数据没有列标签时，用于结果帧的列标签，默认为 RangeIndex(0, 1, 2, …, n)。如果数据包含列标签，将执行列选择。

dtypedtype，默认 None

要强制的数据类型。只允许单一的dtype。如果为None，则推断。如果 data 是 DataFrame，则忽略。

复制布尔值或无，默认无

从输入中复制数据。对于字典数据，None 的默认行为类似于 copy=True。对于 DataFrame 或 2d ndarray 输入，None 的默认行为类似于 copy=False。如果数据是一个包含一个或多个 Series（可能具有不同的数据类型）的字典，copy=False 将确保这些输入不会被复制。

在 1.3.0 版本发生变更.

参见

DataFrame.from_records: 从元组构造，也记录数组。
DataFrame.from_dict: 从 Series、数组或字典的字典中。
read_csv: 将逗号分隔值（csv）文件读取到 DataFrame 中。
read_table: 将通用分隔文件读取到 DataFrame 中。
read_clipboard: 从剪贴板读取文本到 DataFrame。

备注

更多信息请参考用户指南。

例子

从字典构造 DataFrame。

>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2
0     1     3
1     2     4

请注意，推断的 dtype 是 int64。

>>> df.dtypes
col1    int64
col2    int64
dtype: object

要强制使用单一数据类型：

>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1    int8
col2    int8
dtype: object

从包含 Series 的字典构造 DataFrame：

>>> d = {"col1": [0, 1, 2, 3], "col2": pd.Series([2, 3], index=[2, 3])}
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])
   col1  col2
0     0   NaN
1     1   NaN
2     2   2.0
3     3   3.0

从 numpy ndarray 构建 DataFrame：

>>> df2 = pd.DataFrame(
...     np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["a", "b", "c"]
... )
>>> df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

从带有标签列的 numpy ndarray 构建 DataFrame：

>>> data = np.array(
...     [(1, 2, 3), (4, 5, 6), (7, 8, 9)],
...     dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")],
... )
>>> df3 = pd.DataFrame(data, columns=["c", "a"])
>>> df3
   c  a
0  3  1
1  6  4
2  9  7

从 dataclass 构建 DataFrame:

>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
   x  y
0  0  0
1  0  3
2  2  3

从 Series/DataFrame 构建 DataFrame：

>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"])
>>> df = pd.DataFrame(data=ser, index=["a", "c"])
>>> df
   0
a  1
c  3

>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])
>>> df2
   x
a  1
c  3

属性

`T`	DataFrame 的转置。
`at`	访问行/列标签对的一个单一值。
`属性`	该数据集的全局属性字典。
`轴`	返回一个表示 DataFrame 轴的列表。
`列`	DataFrame 的列标签。
`dtypes`	返回 DataFrame 中的数据类型。
`空`	指示 Series/DataFrame 是否为空。
`标志`	获取与此 pandas 对象关联的属性。
`iat`	通过整数位置访问行/列对的单个值。
`iloc`	纯基于整数位置的索引，用于按位置选择。
`索引`	DataFrame 的索引（行标签）。
`loc`	通过标签或布尔数组访问一组行和列。
`ndim`	返回一个表示轴数 / 数组维数的整数。
`形状`	返回一个表示 DataFrame 维度的元组。
`大小`	返回一个表示此对象中元素数量的整数。
`样式`	返回一个 Styler 对象。
`值`	返回 DataFrame 的 Numpy 表示形式。

方法

`abs`()	返回一个包含每个元素绝对数值的 Series/DataFrame。
`加`(other[, axis, level, fill_value])	获取数据框和其他元素的加法，逐元素操作（二元运算符 add）。
`add_prefix`(prefix[, axis])	使用字符串 prefix 作为标签前缀。
`add_suffix`(suffix[, axis])	使用字符串 suffix 作为后缀标签。
`agg`([func, axis])	在指定轴上使用一个或多个操作进行聚合。
`聚合`([func, axis])	在指定轴上使用一个或多个操作进行聚合。
`对齐`(other[, join, axis, level, copy, ...])	使用指定的连接方法将两个对象沿其轴对齐。
`all`(*[, axis, bool_only, skipna])	返回是否所有元素都是 True，可能是在某个轴上。
`any`(*[, axis, bool_only, skipna])	返回是否任何元素为 True，可能超过一个轴。
`apply`(func[, axis, raw, result_type, args, ...])	沿 DataFrame 的轴应用函数。
`asfreq`(freq[, method, how, normalize, ...])	将时间序列转换为指定频率。
`asof`(where[, subset])	返回 where 之前没有任何 NaN 的最后一行。
`assign`(**kwargs)	将新列分配给 DataFrame。
`astype`(dtype[, copy, errors])	将 pandas 对象转换为指定的数据类型 `dtype`。
`at_time`(time[, asof, axis])	选择特定时间点的值（例如，上午9:30）。
`between_time`(start_time, end_time[, ...])	选择一天中特定时间段的值（例如，上午9:00-9:30）。
`bfill`(*[, axis, inplace, limit, limit_area])	使用下一个有效观测值来填充 NA/NaN 值。
`箱线图`([column, by, ax, fontsize, rot, ...])	从 DataFrame 列中生成箱形图。
`clip`([lower, upper, axis, inplace])	在输入阈值处修剪值。
`combine`(other, func[, fill_value, overwrite])	与另一个 DataFrame 进行列合并。
`combine_first`(other)	使用 other 中相同位置的值更新空元素。
`比较`(other[, align_axis, keep_shape, ...])	比较另一个 DataFrame 并显示差异。
`convert_dtypes`([infer_objects, ...])	将列从 numpy dtypes 转换为支持 `pd.NA` 的最佳 dtypes。
`复制`([deep])	复制此对象的索引和数据。
`corr`([method, min_periods, numeric_only])	计算列之间的成对相关性，排除NA/null值。
`corrwith`(other[, axis, drop, method, ...])	计算成对相关性。
`count`([axis, numeric_only])	计算每列或每行的非NA单元格数量。
`cov`([min_periods, ddof, numeric_only])	计算列之间的成对协方差，排除NA/null值。
`cummax`([axis, skipna, numeric_only])	返回 DataFrame 或 Series 轴上的累积最大值。
`cummin`([axis, skipna, numeric_only])	返回 DataFrame 或 Series 轴上的累积最小值。
`cumprod`([axis, skipna, numeric_only])	返回 DataFrame 或 Series 轴上的累积乘积。
`cumsum`([axis, skipna, numeric_only])	返回 DataFrame 或 Series 轴上的累积和。
`描述`([百分位数, 包含, 排除])	生成描述性统计数据。
`diff`([periods, axis])	元素的第一个离散差分。
`div`(other[, axis, level, fill_value])	获取数据框和其他对象的浮点除法，逐元素操作（二元运算符 truediv）。
`divide`(other[, axis, level, fill_value])	获取数据框和其他对象的浮点除法，逐元素操作（二元运算符 truediv）。
`点积`(other)	计算 DataFrame 和其他之间的矩阵乘法。
`删除`([labels, axis, index, columns, level, ...])	从行或列中删除指定的标签。
`drop_duplicates`([subset, keep, inplace, ...])	返回删除了重复行的 DataFrame。
`droplevel`(level[, axis])	返回删除了请求的索引/列级别的 Series/DataFrame。
`dropna`(*[, axis, how, thresh, subset, ...])	移除缺失值。
`duplicated`([subset, keep])	返回表示重复行的布尔序列。
`eq`(other[, axis, level])	Get Not equal to of dataframe and other, element-wise (binary operator eq).
`等于`(other)	测试两个对象是否包含相同的元素。
`eval`(expr, *[, inplace])	评估一个描述对 DataFrame 列进行操作的字符串。
`ewm`([com, span, halflife, alpha, ...])	提供指数加权（EW）计算。
`扩展`([min_periods, method])	提供扩展窗口计算。
`explode`(column[, ignore_index])	将类似列表的每个元素转换为一行，复制索引值。
`ffill`(*[, axis, inplace, limit, limit_area])	用最后一个有效观测值填充 NA/NaN 值以传播到下一个有效值。
`fillna`(value, *[, axis, inplace, limit])	用 value 填充 NA/NaN 值。
`filter`([items, like, regex, axis])	根据指定的索引标签对 DataFrame 或 Series 进行子集化。
`first_valid_index`()	返回第一个非缺失值的索引，如果找不到值则返回 None。
`floordiv`(other[, axis, level, fill_value])	获取数据框和其他对象的整数除法，逐元素操作（二元运算符 floordiv）。
`from_dict`(data[, orient, dtype, columns])	从数组类或字典的字典构造 DataFrame。
`from_records`(data[, index, exclude, ...])	将结构化或记录的 ndarray 转换为 DataFrame。
`ge`(other[, axis, level])	获取数据框和另一个数据框的元素级大于或等于（二元运算符 ge）。
`获取`(key[, default])	从对象中获取给定键的项（例如：DataFrame 列）。
`groupby`([by, level, as_index, sort, ...])	使用映射器或按列的Series对DataFrame进行分组。
`gt`(other[, axis, level])	获取数据框和其他元素之间的更大值，逐元素操作（二元运算符 gt）。
`head`([n])	返回前 n 行。
`hist`([column, by, grid, xlabelsize, xrot, ...])	绘制 DataFrame 列的直方图。
`idxmax`([axis, skipna, numeric_only])	返回请求轴上最大值的第一个出现的索引。
`idxmin`([axis, skipna, numeric_only])	返回请求轴上最小值的第一个出现的索引。
`infer_objects`([copy])	尝试为对象列推断更好的数据类型。
`信息`([verbose, buf, max_cols, memory_usage, ...])	打印一个 DataFrame 的简要摘要。
`插入`(loc, column, value[, allow_duplicates])	在指定位置将列插入到 DataFrame 中。
`插值`([method, axis, limit, inplace, ...])	使用插值方法填充NaN值。
`isetitem`(loc, value)	在位置为 loc 的列中设置给定值。
`isin`(values)	DataFrame 中的每个元素是否包含在值中。
`isna`()	检测缺失值。
`isnull`()	DataFrame.isnull 是 DataFrame.isna 的别名。
`items`()	遍历 (列名, 系列) 对。
`iterrows`()	遍历 DataFrame 行作为 (索引, Series) 对。
`itertuples`([index, name])	迭代 DataFrame 行作为命名元组。
`join`(other[, on, how, lsuffix, rsuffix, ...])	连接另一个DataFrame的列。
`键`()	获取 '信息轴'（更多信息请参见索引）。
`kurt`(*[, axis, skipna, numeric_only])	返回请求轴上的无偏峰度。
`峰度`(*[, axis, skipna, numeric_only])	返回请求轴上的无偏峰度。
`last_valid_index`()	返回最后一个非缺失值的索引，如果没有找到值，则返回 None。
`le`(other[, axis, level])	Get Greater than or equal to of dataframe and other, element-wise (binary operator le).
`lt`(other[, axis, level])	Get Greater than of dataframe and other, element-wise (binary operator lt).
`map`(func[, na_action])	对 Dataframe 的每个元素应用一个函数。
`mask`(cond[, other, inplace, axis, level])	在条件为真时替换值。
`max`(*[, axis, skipna, numeric_only])	返回请求轴上值的最大值。
`mean`(*[, axis, skipna, numeric_only])	返回请求轴上值的平均值。
`中位数`(*[, axis, skipna, numeric_only])	返回请求轴上值的中位数。
`melt`([id_vars, value_vars, var_name, ...])	将DataFrame从宽格式透视为长格式，可选择保留标识符集。
`memory_usage`([index, deep])	返回每个列的内存使用情况，以字节为单位。
`合并`(right[, how, on, left_on, right_on, ...])	使用数据库风格的连接合并 DataFrame 或命名 Series 对象。
`min`(*[, axis, skipna, numeric_only])	返回请求轴上值的最小值。
`mod`(other[, axis, level, fill_value])	获取数据框和其他对象的元素级模运算（二元运算符 mod）。
`模式`([axis, numeric_only, dropna])	获取沿所选轴的每个元素的模式。
`mul`(other[, axis, level, fill_value])	获取数据框和其他元素的乘积，逐元素操作（二元运算符 mul）。
`multiply`(other[, axis, level, fill_value])	获取数据框和其他元素的乘积，逐元素操作（二元运算符 mul）。
`ne`(other[, axis, level])	获取数据框和另一个数据框的元素不等操作（二元运算符 ne）。
`nlargest`(n, columns[, keep])	返回按 columns 降序排列的前 n 行。
`notna`()	检测现有的（非缺失的）值。
`notnull`()	DataFrame.notnull 是 DataFrame.notna 的别名。
`nsmallest`(n, columns[, keep])	返回按 columns 升序排列的前 n 行。
`nunique`([axis, dropna])	计算指定轴中不同元素的数量。
`pct_change`([periods, fill_method, freq])	当前元素与先前元素之间的分数变化。
`pipe`(func, args, *kwargs)	应用期望 Series 或 DataFrames 的可链接函数。
`pivot`(*, columns[, index, values])	返回按给定索引/列值组织的重塑DataFrame。
`pivot_table`([values, index, columns, ...])	创建一个类似于电子表格样式的数据透视表作为 DataFrame。
`pop`(item)	返回项目并从 DataFrame 中删除它。
`pow`(other[, axis, level, fill_value])	获取数据框和其他元素的指数幂（二元运算符 pow）。
`prod`(*[, axis, skipna, numeric_only, min_count])	返回所请求轴上值的乘积。
`product`(*[, axis, skipna, numeric_only, ...])	返回所请求轴上值的乘积。
`分位数`([q, axis, numeric_only, ...])	返回在请求轴上给定分位数的返回值。
`查询`(expr, *[, inplace])	使用布尔表达式查询DataFrame的列。
`radd`(other[, axis, level, fill_value])	获取数据框和其他对象的逐元素相加结果（二元运算符 radd）。
`rank`([axis, method, numeric_only, ...])	沿轴计算数值数据的排名（从1到n）。
`rdiv`(other[, axis, level, fill_value])	获取数据框和其他元素的浮点除法（二元运算符 rtruediv）。
`reindex`([labels, index, columns, axis, ...])	使 DataFrame 符合新的索引，并带有可选的填充逻辑。
`reindex_like`(other[, method, copy, limit, ...])	返回一个对象，其索引与其他对象匹配。
`重命名`([mapper, index, columns, axis, copy, ...])	重命名列或索引标签。
`rename_axis`([mapper, index, columns, axis, ...])	设置索引或列的轴名称。
`reorder_levels`(order[, axis])	使用输入 `order` 重新排列索引或列级别。
`替换`([to_replace, value, inplace, regex])	将 to_replace 中的值替换为 value。
`重采样`(rule[, closed, label, convention, ...])	重采样时间序列数据。
`reset_index`([level, drop, inplace, ...])	重置索引，或重置其一个级别。
`rfloordiv`(other[, axis, level, fill_value])	获取数据框和其他对象的整数除法，逐元素进行（二元运算符 rfloordiv）。
`rmod`(other[, axis, level, fill_value])	获取数据框和其他对象的模数，逐元素（二元运算符 rmod）。
`rmul`(other[, axis, level, fill_value])	获取数据框和其他元素的乘积，逐元素操作（二元运算符 rmul）。
`rolling`(window[, min_periods, center, ...])	提供滚动窗口计算。
`round`([decimals])	将 DataFrame 四舍五入到可变的小数位数。
`rpow`(other[, axis, level, fill_value])	获取数据框和其他元素的指数幂（二元运算符 rpow）。
`rsub`(other[, axis, level, fill_value])	获取数据框和其他对象之间的逐元素减法（二元运算符 rsub）。
`rtruediv`(other[, axis, level, fill_value])	获取数据框和其他元素的浮点除法（二元运算符 rtruediv）。
`样本`([n, frac, replace, weights, ...])	从对象的轴返回一个随机样本项。
`select_dtypes`([include, exclude])	根据列的数据类型返回DataFrame列的子集。
`sem`(*[, axis, skipna, ddof, numeric_only])	返回请求轴上的无偏均值标准误差。
`set_axis`(labels, *[, axis, copy])	将所需的索引分配给给定的轴。
`set_flags`(*[, copy, allows_duplicate_labels])	返回一个带有更新标志的新对象。
`set_index`(keys, *[, drop, append, inplace, ...])	使用现有列设置 DataFrame 索引。
`shift`([periods, freq, axis, fill_value, suffix])	通过可选的时间 freq 将索引按所需周期数移动。
`偏度`(*[, axis, skipna, numeric_only])	返回请求轴上的无偏斜度。
`sort_index`(*[, axis, level, ascending, ...])	按标签排序对象（沿轴）。
`sort_values`(by, *[, axis, ascending, ...])	按任一轴的值排序。
`挤压`([轴])	将一维轴对象压缩为标量。
`stack`([level, dropna, sort, future_stack])	将规定的级别从列堆叠到索引。
`std`(*[, axis, skipna, ddof, numeric_only])	返回请求轴上的样本标准偏差。
`sub`(other[, axis, level, fill_value])	获取数据框和其他对象的逐元素减法（二元运算符 sub）。
`减法`(other[, axis, level, fill_value])	获取数据框和其他对象的逐元素减法（二元运算符 sub）。
`sum`(*[, axis, skipna, numeric_only, min_count])	返回所请求轴上值的总和。
`swaplevel`([i, j, axis])	在 `MultiIndex` 中交换级别 i 和 j。
`尾部`([n])	返回最后 n 行。
`take`(indices[, axis])	返回沿指定轴的给定位置索引中的元素。
`to_clipboard`(*[, excel, sep])	将对象复制到系统剪贴板。
`to_csv`([path_or_buf, sep, na_rep, ...])	将对象写入逗号分隔值（csv）文件。
`to_dict`([orient, into, index])	将 DataFrame 转换为字典。
`to_excel`(excel_writer, *[, sheet_name, ...])	将对象写入 Excel 表格。
`to_feather`(path, **kwargs)	将 DataFrame 写入二进制 Feather 格式。
`to_hdf`(path_or_buf, *, key[, mode, ...])	使用 HDFStore 将包含的数据写入 HDF5 文件。
`to_html`([buf, columns, col_space, header, ...])	将 DataFrame 渲染为 HTML 表格。
`to_json`([path_or_buf, orient, date_format, ...])	将对象转换为 JSON 字符串。
`to_latex`([buf, columns, header, index, ...])	将对象渲染为 LaTeX 表格、长表格或嵌套表格。
`to_markdown`([buf, mode, index, storage_options])	以Markdown友好格式打印DataFrame。
`to_numpy`([dtype, copy, na_value])	将 DataFrame 转换为 NumPy 数组。
`to_orc`([path, engine, index, engine_kwargs])	将一个 DataFrame 写入优化行列（ORC）格式。
`to_parquet`([路径, 引擎, 压缩, ...])	将 DataFrame 写入二进制 parquet 格式。
`to_period`([freq, axis, copy])	将 DataFrame 从 DatetimeIndex 转换为 PeriodIndex。
`to_pickle`(路径, *[, 压缩, 协议, ...])	将对象（序列化）保存到文件中。
`to_records`([index, column_dtypes, index_dtypes])	将 DataFrame 转换为 NumPy 记录数组。
`to_sql`(name, con, *[, schema, if_exists, ...])	将存储在 DataFrame 中的记录写入 SQL 数据库。
`to_stata`(路径, *[, 转换日期, ...])	将 DataFrame 对象导出为 Stata dta 格式。
`to_string`([buf, columns, col_space, header, ...])	将 DataFrame 渲染为控制台友好的表格输出。
`to_timestamp`([freq, how, axis, copy])	将 PeriodIndex 转换为时间戳的 DatetimeIndex，在开始时间。
`to_xarray`()	从 pandas 对象返回一个 xarray 对象。
`to_xml`([path_or_buffer, index, root_name, ...])	将 DataFrame 渲染为 XML 文档。
`transform`(func[, axis])	在自身上调用 `func` 生成一个与自身轴形状相同的 DataFrame。
`转置`(*args[, copy])	转置索引和列。
`truediv`(other[, axis, level, fill_value])	获取数据框和其他对象的浮点除法，逐元素操作（二元运算符 truediv）。
`truncate`([before, after, axis, copy])	在某个索引值之前和之后截断一个 Series 或 DataFrame。
`tz_convert`(tz[, axis, level, copy])	将 tz-aware 轴转换为目标时区。
`tz_localize`(tz[, axis, level, copy, ...])	将 Series 或 DataFrame 的时区未指定索引本地化为目标时区。
`unstack`([level, fill_value, sort])	旋转（必须是层次结构的）索引标签的一个级别。
`更新`(other[, join, overwrite, ...])	使用另一个DataFrame中的非NA值就地修改。
`value_counts`([subset, normalize, sort, ...])	返回一个包含 DataFrame 中每一行不同频率的 Series。
`var`(*[, axis, skipna, ddof, numeric_only])	返回请求轴上的无偏方差。
`where`(cond[, other, inplace, axis, level])	在条件为假的地方替换值。
`xs`(key[, axis, level, drop_level])	从 Series/DataFrame 返回横截面。