示例

classv3

一个训练实例

一个Example保存了一个训练实例的信息。它存储了两个Doc对象：一个用于保存黄金标准参考数据，另一个用于保存管道的预测结果。一个Alignment对象存储了这两个文档之间的对齐关系，因为它们的标记化可能不同。

Example.init 方法

从predicted文档和reference文档构建一个Example对象。如果alignment为None，则会根据两个文档中的单词进行初始化。

名称	描述
`predicted`	The document containing (partial) predictions. Cannot be `None`. Doc
`reference`	The document containing gold-standard annotations. Cannot be `None`. Doc
仅关键字
`alignment`	An object holding the alignment between the tokens of the `predicted` and `reference` documents. Optional[Alignment]

Example.from_dict 类方法

从predicted文档和以字典形式提供的参考标注中构建一个Example对象。有关所需格式的更多详细信息，请参阅训练格式文档。

名称	描述
`predicted`	The document containing (partial) predictions. Cannot be `None`. Doc
`example_dict`	The gold-standard annotations as a dictionary. Cannot be `None`. Dict[str, Any]
返回值	新构建的对象。示例

Example.text 属性

这个Example中predicted文档的文本内容。

名称	描述
RETURNS	The text of the `predicted` document. str

Example.predicted 属性

包含预测结果的Doc对象。有时也被称为example.x。

名称	描述
返回值	包含(部分)预测结果的文档。Doc

Example.reference 属性

包含黄金标准标注的Doc对象。有时也被称为example.y。

名称	描述
返回值	包含黄金标准注释的文档。Doc

Example.alignment 属性

Alignment对象将predicted文档的标记映射到reference文档的标记。

名称	描述
返回值	包含黄金标准标注的文档。Alignment

Example.get_aligned 方法

获取某个词符属性的对齐视图，由其整型ID或字符串名称表示。

名称	描述
`field`	Attribute ID or string name. Union[int, str]
`as_string`	Whether or not to return the list of values as strings. Defaults to `False`. bool
RETURNS	List of integer values, or string values if `as_string` is `True`. Union[List[int], List[str]]

Example.get_aligned_parse 方法

获取依存句法分析的对齐视图。如果projectivize参数设置为True，非投射依存树将通过Nivre和Nilsson(2005)提出的伪投射依存句法分析算法转换为投射结构。

名称	描述
`projectivize`	Whether or not to projectivize the dependency trees. Defaults to `True`. bool
RETURNS	List of integer values, or string values if `as_string` is `True`. Union[List[int], List[str]]

Example.get_aligned_ner 方法

获取NER BILUO 标签的对齐视图。

名称	描述
返回值	BILUO值列表，表示标记是否为NER标注的一部分。List[str]

Example.get_aligned_spans_y2x 方法

获取定义在Example.reference上的任意一组Span对象的对齐视图。生成的span索引将与Example.predicted中的分词结果对齐。

名称	描述
`y_spans`	`Span` objects aligned to the tokenization of `reference`. Iterable[Span]
`allow_overlap`	Whether the resulting `Span` objects may overlap or not. Set to `False` by default. bool
RETURNS	`Span` objects aligned to the tokenization of `predicted`. List[Span]

Example.get_aligned_spans_x2y 方法

获取定义在Example.predicted上的任意一组Span对象的对齐视图。生成的span索引将与Example.reference中的分词对齐。该方法特别适用于根据原始黄金标准标注评估预测实体的准确性。

名称	描述
`x_spans`	`Span` objects aligned to the tokenization of `predicted`. Iterable[Span]
`allow_overlap`	Whether the resulting `Span` objects may overlap or not. Set to `False` by default. bool
RETURNS	`Span` objects aligned to the tokenization of `reference`. List[Span]

Example.to_dict 方法

返回该Example中包含的参考注释的字典表示。

名称	描述
返回值	参考注释的字典表示形式。Dict[str, Any]

Example.split_sents 方法

将一个Example分割成多个Example对象，每个句子对应一个。

名称	描述
RETURNS	List of `Example` objects, one for each original sentence. List[Example]

对齐 v3.0

计算两种分词之间的对齐表。

对齐属性

对齐属性通过AlignmentArray进行管理，这是Thinc库中Ragged类型的简化版本，仅支持data和length属性。

名称	描述
`x2y`	The `AlignmentArray` object holding the alignment from `x` to `y`. AlignmentArray
`y2x`	The `AlignmentArray` object holding the alignment from `y` to `x`. AlignmentArray

Alignment.from_strings 函数

名称	描述
`A`	String values of candidate tokens to align. List[str]
`B`	String values of reference tokens to align. List[str]
RETURNS	An `Alignment` object describing the alignment. Alignment

建议编辑