HitsAtK

class HitsAtK(k: int = 10)[来源]

基础：RankBasedMetric

命中率 @ k。

命中率 @ k 描述了在排序后的排名列表的前 \(k\) 个实体中出现的真实实体的比例。将个体排名的集合表示为 \(\mathcal{I}\)，其计算公式为：

\[H_k = \frac{1}{|\mathcal{I}|} \sum \limits_{r \in \mathcal{I}} \mathbb{I}[r \leq k]\]

例如，如果谷歌在第一页显示20个结果，那么相关结果的百分比就是hits @ 20。无论\(k\)的值是多少，hits @ k都位于\([0, 1]\)区间内，越接近1越好。

警告

该指标不区分排名大于\(k\)的情况。这意味着排名为\(k+1\)和\(k+d\)（其中\(d \gg 1\)）的失误对最终得分的影响是相同的。因此，它不太适合用于比较不同的模型。

对于期望值，我们首先注意到

\[\mathbb{I}[r_i \leq k] \sim \textit{Bernoulli}(p_i)\]

使用 \(p_i = \min\{\frac{k}{N_i}, 1\}\)。因此，我们有

\[\mathbb{E}[\mathbb{I}[r_i \leq k]] = p_i\]

和

\[\mathbb{V}[\mathbb{I}[r_i \leq k]] = p_i \cdot (1 - p_i)\]

因此，我们得到

\[\begin{split}\mathbb{E}[Hits@k] &= \mathbb{E}\left[\frac{1}{n} \sum \limits_{i=1}^{n} \mathbb{I}[r_i \leq k]\right] \\ &= \frac{1}{n} \sum \limits_{i=1}^{n} \mathbb{E}[\mathbb{I}[r_i \leq k]] \\ &= \frac{1}{n} \sum \limits_{i=1}^{n} p_i\end{split}\]

对于方差，我们有

\[\begin{split}\mathbb{V}[Hits@k] &= \mathbb{V}\left[\frac{1}{n} \sum \limits_{i=1}^{n} \mathbb{I}[r_i \leq k]\right] \\ &= \frac{1}{n^2} \sum \limits_{i=1}^{n} \mathbb{V}\left[\mathbb{I}[r_i \leq k]\right] \\ &= \frac{1}{n^2} \sum \limits_{i=1}^{n} p_i(1 - p_i)\end{split}\]

初始化指标。

Parameters:: k (int) – 参数 \(k\) 表示要考虑的顶部条目数量

属性摘要

`binarize`	指标是否需要二值化分数
`closed_expectation`	是否存在期望的闭式解
`closed_variance`	是否存在方差的闭式解
`increasing`	是否在增加，即较大的值更好
`key`	返回用于度量结果字典的键。
`name`	指标的名称
`needs_candidates`	指标是否需要每个排名任务的候选者数量
`supported_rank_types`	支持的排名类型。
`supports_weights`	指标是否支持权重
`synonyms`	此指标的同义词
`value_range`	取值范围

方法总结

`__call__`(ranks[, num_candidates, weights])	评估指标。
`expected_value`(num_candidates[, ...])	计算预期的指标值。
`extra_repr`()	生成额外的 repr，参见。
`get_description`()	获取描述。
`get_link`()	从docdata中获取链接。
`get_range`()	获取此指标范围的数学表示。
`get_sampled_values`(num_candidates, num_samples)	在采样的排名数组上计算指标。
`iter_extra_repr`()	遍历`extra_repr()`的组件。
`numeric_expected_value`(**kwargs)	通过求和计算预期的指标值。
`numeric_expected_value_with_ci`(**kwargs)	估计带有置信区间的期望值。
`numeric_variance`(**kwargs)	通过求和计算方差。
`numeric_variance_with_ci`(**kwargs)	估计带有置信区间的方差。
`std`(num_candidates[, num_samples, weights])	计算标准差。
`variance`(num_candidates[, num_samples, weights])	计算方差。

属性文档

binarize: ClassVar[bool] = False: 指标是否需要二值化分数

closed_expectation: ClassVar[bool] = True: 是否存在期望的闭式解

closed_variance: ClassVar[bool] = True: 是否存在方差的闭式解

increasing: ClassVar[bool] = True: 是否在增加，即较大的值更好

key

name: ClassVar[str] = 'Hits @ K': 指标的名称

needs_candidates: ClassVar[bool] = False: 指标是否需要每个排名任务的候选者数量

supported_rank_types: ClassVar[Collection[Literal['optimistic', 'realistic', 'pessimistic']]] = ('optimistic', 'realistic', 'pessimistic'): 支持的排名类型。大多数情况下等于所有排名类型

supports_weights: ClassVar[bool] = True: 指标是否支持权重

synonyms: ClassVar[Collection[str]] = ('h@k', 'hits@k', 'h@', 'hits@', 'hits_at_', 'h_at_'): 此指标的同义词

value_range: ClassVar[ValueRange] = ValueRange(lower=0, lower_inclusive=True, upper=1, upper_inclusive=True): 取值范围

方法文档

__call__(ranks: ndarray, num_candidates: ndarray | None = None, weights: ndarray | None = None) → float[来源]

评估指标。

Parameters:

ranks (ndarray) – 形状: s 各个排名
num_candidates (ndarray | None) – 形状: s 每个单独排名任务的候选数量
weights (ndarray | None) – 形状: s 各个等级的权重

Return type:

float

expected_value(num_candidates: ndarray, num_samples: int | None = None, weights: ndarray | None = None, **kwargs) → float[来源]

计算预期的指标值。

期望值是在假设每个个体排名遵循离散均匀分布 \(\mathcal{U}\left(1, N_i\right)\) 的情况下计算的，其中 \(N_i\) 表示排名任务 \(r_i\) 的候选者数量。

Parameters:

num_candidates (ndarray) – 每个单独排名计算的候选数量
num_samples (int | None) – 用于模拟的样本数量，如果没有实现闭式期望值
weights (ndarray | None) – 形状: s 各个排名任务的权重
kwargs – 如果没有封闭形式的解决方案，则传递给get_sampled_values()的额外基于关键字的参数

Returns:

该指标的期望值

Raises:

NoClosedFormError – 如果未实现闭式期望且未给出样本数量，则抛出此错误

Return type:

float

注意

如果可用，优先选择解析解，但会回退到通过求和进行数值估计，参见 RankBasedMetric.numeric_expected_value()。

extra_repr() → str

生成额外的 repr，参见 :meth`torch.nn.Module.extra_repr`。

Returns:: repr() 的额外部分
Return type:: str

classmethod get_description() → str

获取描述。

Return type:: str

classmethod get_link() → str

从docdata中获取链接。

Return type:: str

classmethod get_range() → str

获取此指标范围的数学表示。

Return type:: str

get_sampled_values(num_candidates: ndarray, num_samples: int, weights: ndarray | None = None, generator: Generator | None = None, memory_intense: bool = True) → ndarray

在采样的排名数组上计算指标。

Parameters:

num_candidates (ndarray) – 形状: s 每个排名任务的候选数量
num_samples (int) – 样本数量
weights (ndarray | None) – 形状: s 各个排名任务的权重
generator (Generator | None) – 用于可重复性的随机状态
memory_intense (bool) – 是否使用更占用内存但更节省时间的变体

Returns:

形状: (num_samples,) 在num_samples采样的排名数组上评估的指标

Return type:

ndarray

iter_extra_repr() → Iterable[str][source]

遍历extra_repr()的组件。

此方法通常被重写。一个常见的模式是

def iter_extra_repr(self) -> Iterable[str]:
    yield from super().iter_extra_repr()
    yield "<key1>=<value1>"
    yield "<key2>=<value2>"

Returns:: 一个可迭代的单个组件，用于extra_repr()
Return type:: Iterable[str]

numeric_expected_value(**kwargs) → float

通过求和计算预期的指标值。

期望值是在假设每个个体排名遵循离散均匀分布\(\mathcal{U}\left(1, N_i\right)\)的情况下计算的，其中\(N_i\)表示排名任务\(r_i\)的候选者数量。

Parameters:: kwargs – 传递给get_sampled_values()的基于关键字的参数
Returns:: 该指标的估计期望值
Return type:: float

警告

根据指标的不同，估计可能不太准确且收敛缓慢，参见 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_discrete.expect.html

numeric_expected_value_with_ci(**kwargs) → ndarray

估计带有置信区间的期望值。

Return type:: ndarray

numeric_variance(**kwargs) → float

通过求和计算方差。

方差的计算基于每个个体排名遵循离散均匀分布的假设 \(\mathcal{U}\left(1, N_i\right)\)，其中 \(N_i\) 表示排名任务 \(r_i\) 的候选者数量。

Parameters:: kwargs – 传递给get_sampled_values()的基于关键字的参数
Returns:: 此指标的估计方差
Return type:: float

警告

根据指标的不同，估计可能不太准确且收敛缓慢，参见 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_discrete.expect.html

numeric_variance_with_ci(**kwargs) → ndarray

估计带有置信区间的方差。

Return type:: ndarray

std(num_candidates: ndarray, num_samples: int | None = None, weights: ndarray | None = None, **kwargs) → float

计算标准差。

Parameters:

num_candidates (ndarray) – 每个单独排名计算的候选数量
num_samples (int | None) – 用于模拟的样本数量，如果没有实现闭式期望值
weights (ndarray | None) – 形状: s 各个排名任务的权重
kwargs – 传递给variance()的额外基于关键字的参数，

Returns:

该指标的标准差（即方差的平方根）

Return type:

float

有关详细解释，请参见 RankBasedMetric.variance()。

variance(num_candidates: ndarray, num_samples: int | None = None, weights: ndarray | None = None, **kwargs) → float[来源]

计算方差。

方差的计算基于每个个体排名遵循离散均匀分布的假设 \(\mathcal{U}\left(1, N_i\right)\)，其中 \(N_i\) 表示排名任务 \(r_i\) 的候选者数量。

Parameters:

num_candidates (ndarray) – 每个单独排名计算的候选数量
num_samples (int | None) – 用于模拟的样本数量，如果没有实现闭式期望值
weights (ndarray | None) – 形状: s 各个排名任务的权重
kwargs – 如果没有封闭形式的解决方案，则传递给get_sampled_values()的额外基于关键字的参数

Returns:

该指标的方差

Raises:

NoClosedFormError – 如果没有实现闭式方差并且没有给出样本数量，则引发此错误

Return type:

float

注意

如果可用，优先选择解析解，但会回退到通过求和进行数值估计，参见 RankBasedMetric.numeric_variance()。