InMemoryLookupKB

classv3.5

KnowledgeBase接口的默认实现。将所有信息存储在内存中。

InMemoryLookupKB类继承自KnowledgeBase并实现了其所有方法。它将所有知识库数据存储在内存中，并通过精确匹配提及与实体名称来生成Candidate对象。该实现高度优化，兼具低内存占用和快速检索的特点。

InMemoryLookupKB.init 方法

创建知识库。

名称	描述
`vocab`	The shared vocabulary. Vocab
`entity_vector_length`	Length of the fixed-size entity vectors. int

InMemoryLookupKB.entity_vector_length 属性

知识库中固定大小实体向量的长度。

名称	描述
返回值	固定大小实体向量的长度。int

InMemoryLookupKB.add_entity 方法

向知识库添加一个实体，指定其语料频率和实体向量，该向量的长度应为entity_vector_length。

名称	描述
`entity`	The unique entity identifier. str
`freq`	The frequency of the entity in a typical corpus. float
`entity_vector`	The pretrained vector of the entity. numpy.ndarray

InMemoryLookupKB.set_entities 方法

定义知识库中的完整实体列表，为每个实体指定语料库频率和实体向量。

名称	描述
`entity_list`	List of unique entity identifiers. Iterable[Union[str, int]]
`freq_list`	List of entity frequencies. Iterable[int]
`vector_list`	List of entity vectors. Iterable[numpy.ndarray]

InMemoryLookupKB.add_alias 方法

向知识库添加别名或提及，指定其潜在的知识库标识符及其先验概率。实体标识符应引用先前通过add_entity或set_entities添加的实体。先验概率的总和不应超过1。请注意，空字符串不能用作别名。

名称	描述
`alias`	The textual mention or alias. Can not be the empty string. str
`entities`	The potential entities that the alias may refer to. Iterable[Union[str, int]]
`probabilities`	The prior probabilities of each entity. Iterable[float]

InMemoryLookupKB.len 方法

获取知识库中的实体总数。

名称	描述
返回值	知识库中的实体数量。int

InMemoryLookupKB.get_entity_strings 方法

获取知识库中所有实体ID的列表。

名称	描述
返回值	知识库中的实体列表。List[str]

InMemoryLookupKB.get_size_aliases 方法

获取知识库中的别名总数。

名称	描述
返回值	知识库中的别名数量。int

InMemoryLookupKB.get_alias_strings 方法

获取知识库中所有别名的列表。

名称	描述
返回值	知识库中的别名列表。List[str]

InMemoryLookupKB.get_candidates 方法

给定某个文本提及作为输入，检索类型为Candidate的候选实体列表。封装了get_alias_candidates()函数。

名称	描述
`mention`	The textual mention or alias. Span
RETURNS	An iterable of relevant `Candidate` objects. Iterable[Candidate]

InMemoryLookupKB.get_candidates_batch 方法

与get_candidates()相同，但适用于任意数量的提及。当配置参数candidates_batch_size大于或等于1时，EntityLinker组件将调用get_candidates_batch()而非get_candidates()。

get_candidates_batch()的默认实现会循环执行get_candidates()。如果性能对您很重要，我们建议实现一种更高效的方式来一次性检索多个提及的候选对象。

名称	描述
`mentions`	The textual mention or alias. Iterable[Span]
RETURNS	An iterable of iterable with relevant `Candidate` objects. Iterable[Iterable[Candidate]]

InMemoryLookupKB.get_alias_candidates 方法

给定某个文本提及作为输入，检索类型为Candidate的候选实体列表。

名称	描述
`alias`	The textual mention or alias. str
RETURNS	The list of relevant `Candidate` objects. List[Candidate]

InMemoryLookupKB.get_vector 方法

给定某个实体ID，检索其预训练的实体向量。

名称	描述
`entity`	The entity ID. str
返回值	实体向量。numpy.ndarray

InMemoryLookupKB.get_vectors 方法

与get_vector()相同，但适用于任意数量的实体ID。

get_vectors()的默认实现是通过循环执行get_vector()。如果您关注性能，我们建议实现一种更高效的方式来一次性检索多个实体的向量。

名称	描述
`entities`	The entity IDs. Iterable[str]
返回值	实体向量。可迭代[可迭代[numpy.ndarray]]

InMemoryLookupKB.get_prior_prob 方法

给定某个实体ID和特定的文本提及，检索该提及链接到实体ID的先验概率。

名称	描述
`entity`	The entity ID. str
`alias`	The textual mention or alias. str
RETURNS	The prior probability of the `alias` referring to the `entity`. float

InMemoryLookupKB.to_disk 方法

将知识库的当前状态保存到一个目录中。

名称	描述
`path`	A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or `Path`-like objects. Union[str,Path]
`exclude`	List of components to exclude. Iterable[str]

InMemoryLookupKB.from_disk 方法

从给定目录恢复知识库的状态。请注意，Vocab也应与创建知识库时使用的相同。

名称	描述
`loc`	A path to a directory. Paths may be either strings or `Path`-like objects. Union[str,Path]
`exclude`	List of components to exclude. Iterable[str]
RETURNS	The modified `KnowledgeBase` object. KnowledgeBase

建议编辑

其他

InMemoryLookupKB.__init__ 方法