Shortcuts

torcharrow.functional.bucketize

torcharrow.functional.bucketize(value_col: NumericalColumn, borders: Union[ListColumn, List[Union[int, float]]]) NumericalColumn

对输入特征进行分桶处理。这是在推荐领域中常见的操作,用于将密集特征转换为稀疏特征。

Parameters:
  • value_col (定义密集特征的数值列) –

  • borders (离散化稀疏特征的边界值) –

示例

>>> import torcharrow as ta
>>> from torcharrow import functional
>>> a = ta.column([1, 2, 3, 5, 8, 10, 11])
>>> functional.bucketize(a, [2, 5, 10])
0  0
1  0
2  1
3  1
4  2
5  2
6  3
dtype: Int32(nullable=True), length: 7, null_count: 0