pyspark.pandas.DataFrame.to_records ¶

DataFrame. to_records ( index : bool = True , column_dtypes : Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype, Dict[Union[Any, Tuple[Any, …]], Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype]], None] = None , index_dtypes : Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype, Dict[Union[Any, Tuple[Any, …]], Union[str, numpy.dtype, pandas.core.dtypes.base.ExtensionDtype]], None] = None ) → numpy.recarray [source] ¶

将DataFrame转换为NumPy记录数组。

如果请求，索引将作为记录数组的第一个字段包含在内。

注意

此方法仅应在预期生成的 NumPy ndarray 较小时使用，因为所有数据都会加载到驱动程序的内存中。

Parameters

index bool, default True: 在结果记录数组中包含索引，存储在‘index’字段中，或者使用索引标签（如果已设置）。
column_dtypes str, type, dict, default None: 如果是一个字符串或类型，则为存储所有列的数据类型。如果是一个字典，则为列名和索引（从零开始）到特定数据类型的映射。
index_dtypes str, type, dict, default None: 如果是一个字符串或类型，则表示存储所有索引级别的数据类型。如果是一个字典，则表示索引级别名称和索引（从零开始）到特定数据类型的映射。此映射仅在 index=True 时应用。

Returns

numpy.recarray: 带有 DataFrame 标签作为字段，并且每一行作为条目的 NumPy ndarray。

另请参阅

DataFrame.from_records: 将结构化或记录 ndarray 转换为 DataFrame。
numpy.recarray: 一个允许使用属性访问字段的ndarray，类似于电子表格中的类型化列。

示例

           >>> df = ps.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
...                   index=['a', 'b'])
>>> df
   A     B
a  1  0.50
b  2  0.75

          

           >>> df.to_records() 
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

          

可以从记录数组中排除索引：

           >>> df.to_records(index=False) 
rec.array([(1, 0.5 ), (2, 0.75)],
          dtype=[('A', '<i8'), ('B', '<f8')])

          

在 pandas 0.24.0 中，为列指定 dtype 的功能是新增的。可以为列指定数据类型：

           >>> df.to_records(column_dtypes={"A": "int32"}) 
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i4'), ('B', '<f8')])

          

索引的dtype规范是在pandas 0.24.0中新增的。数据类型也可以为索引指定：

           >>> df.to_records(index_dtypes="<S2") 
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('index', 'S2'), ('A', '<i8'), ('B', '<f8')])

          

pyspark.pandas.DataFrame.to_markdown

pyspark.pandas.DataFrame.to_latex