SpanRuler

classv3.3

String name:span_rulerTrainable:

基于规则的跨度和命名实体识别的流水线组件

span ruler 允许您使用基于标记的规则或精确短语匹配，将跨度添加到 Doc.spans 和/或 Doc.ents。有关使用示例，请参阅关于基于规则的跨度匹配的文档。

Assigned Attributes

匹配结果将被保存到Doc.spans[spans_key]作为SpanGroup和/或保存到Doc.ents，其中标注信息存储在Token.ent_type和Token.ent_iob字段中。

位置	值
`Doc.spans[spans_key]`	The annotated spans. SpanGroup
`Doc.ents`	The annotated spans. Tuple[Span]
`Token.ent_iob`	An enum encoding of the IOB part of the named entity tag. int
`Token.ent_iob_`	The IOB part of the named entity tag. str
`Token.ent_type`	The label part of the named entity tag (hash). int
`Token.ent_type_`	The label part of the named entity tag. str

配置与实现

默认配置由管道组件工厂定义，描述了组件应如何配置。您可以通过nlp.add_pipe上的config参数或在您的config.cfg中覆盖其设置。

设置	描述
`spans_key`	The spans key to save the spans under. If `None`, no spans are saved. Defaults to `"ruler"`. Optional[str]
`spans_filter`	The optional method to filter spans before they are assigned to doc.spans. Defaults to `None`. Optional[Callable[[Iterable[Span], Iterable[Span]], List[Span]]]
`annotate_ents`	Whether to save spans to doc.ents. Defaults to `False`. bool
`ents_filter`	The method to filter spans before they are assigned to doc.ents. Defaults to `util.filter_chain_spans`. Callable[[Iterable[Span], Iterable[Span]], List[Span]]
`phrase_matcher_attr`	Token attribute to match on, passed to the internal `PhraseMatcher` as `attr`. Defaults to `None`. Optional[Union[int, str]]
`matcher_fuzzy_compare` v3.5	The fuzzy comparison method, passed on to the internal `Matcher`. Defaults to `spacy.matcher.levenshtein.levenshtein_compare`. Callable
`validate`	Whether patterns should be validated, passed to `Matcher` and `PhraseMatcher` as `validate`. Defaults to `False`. bool
`overwrite`	Whether to remove any existing spans under `Doc.spans[spans key]` if `spans_key` is set, or to remove any ents under `Doc.ents` if `annotate_ents` is set. Defaults to `True`. bool
`scorer`	The scoring method. Defaults to `Scorer.score_spans` for `Doc.spans[spans_key]` with overlapping spans allowed. Optional[Callable]

explosion/spaCy/master/spacy/pipeline/span_ruler.py

SpanRuler.init 方法

初始化span ruler。如果在此处提供了模式，它们需要是一个包含"label"和"pattern"键的字典列表。模式可以是词符模式（列表）或短语模式（字符串）。例如： {"label": "ORG", "pattern": "Apple"}。

名称	描述
`nlp`	The shared nlp object to pass the vocab to the matchers and process phrase patterns. Language
`name`	Instance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. Used to disable the current span ruler while creating phrase patterns with the nlp object. str
仅关键字
`spans_key`	The spans key to save the spans under. If `None`, no spans are saved. Defaults to `"ruler"`. Optional[str]
`spans_filter`	The optional method to filter spans before they are assigned to doc.spans. Defaults to `None`. Optional[Callable[[Iterable[Span], Iterable[Span]], List[Span]]]
`annotate_ents`	Whether to save spans to doc.ents. Defaults to `False`. bool
`ents_filter`	The method to filter spans before they are assigned to doc.ents. Defaults to `util.filter_chain_spans`. Callable[[Iterable[Span], Iterable[Span]], List[Span]]
`phrase_matcher_attr`	Token attribute to match on, passed to the internal PhraseMatcher as `attr`. Defaults to `None`. Optional[Union[int, str]]
`matcher_fuzzy_compare` v3.5	The fuzzy comparison method, passed on to the internal `Matcher`. Defaults to `spacy.matcher.levenshtein.levenshtein_compare`. Callable
`validate`	Whether patterns should be validated, passed to Matcher and PhraseMatcher as `validate`. Defaults to `False`. bool
`overwrite`	Whether to remove any existing spans under `Doc.spans[spans key]` if `spans_key` is set, or to remove any ents under `Doc.ents` if `annotate_ents` is set. Defaults to `True`. bool
`scorer`	The scoring method. Defaults to `Scorer.score_spans` for `Doc.spans[spans_key]` with overlapping spans allowed. Optional[Callable]

SpanRuler.initialize 方法

初始化组件时使用数据，并在训练前用于从模式文件加载规则。该方法通常由Language.initialize调用，并允许您通过配置中的[initialize.components]块自定义接收的参数。初始化时会移除所有现有模式。

名称	描述
`get_examples`	Function that returns gold-standard annotations in the form of `Example` objects. Not used by the `SpanRuler`. Callable[[], Iterable[Example]]
仅关键字
`nlp`	The current `nlp` object. Defaults to `None`. Optional[Language]
`patterns`	The list of patterns. Defaults to `None`. Optional[Sequence[Dict[str, Union[str, List[Dict[str, Any]]]]]]

SpanRuler.len 方法

添加到跨度标尺的所有模式的数量。

名称	描述
返回值	模式的数量。int

SpanRuler.contains 方法

标签是否存在于模式中。

名称	描述
`label`	The label to check. str
返回值	判断span ruler是否包含该标签。bool

SpanRuler.call 方法

在Doc中查找匹配项并将其添加到doc.spans[span_key]和/或 doc.ents中。通常，在将该组件通过nlp.add_pipe添加到流程后，这一过程会自动发生。如果span ruler初始化时设置了overwrite=True参数，现有的spans和entities将被移除。

名称	描述
`doc`	The `Doc` object to process, e.g. the `Doc` in the pipeline. Doc
RETURNS	The modified `Doc` with added spans/entities. Doc

SpanRuler.add_patterns 方法

向span ruler添加模式。模式可以是词符模式（字典列表）或短语模式（字符串）。更多详情请参阅关于基于规则的匹配的使用指南。

名称	描述
`patterns`	The patterns to add. List[Dict[str, Union[str, List[dict]]]]

SpanRuler.remove 方法

根据标签从span ruler中移除模式。如果标签不存在于任何模式中，将抛出ValueError错误。

名称	描述
`label`	The label of the pattern rule. str

SpanRuler.remove_by_id 方法

根据ID从span ruler中移除模式。如果ID在任何模式中不存在，则会引发ValueError错误。

名称	描述
`pattern_id`	The ID of the pattern rule. str

SpanRuler.clear 方法

移除span ruler中的所有模式。

SpanRuler.to_disk 方法

将span ruler模式保存到目录中。这些模式将以换行符分隔的JSON格式(JSONL)保存。

名称	描述
`path`	A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or `Path`-like objects. Union[str,Path]

SpanRuler.from_disk 方法

从路径加载span ruler。

名称	描述
`path`	A path to a directory. Paths may be either strings or `Path`-like objects. Union[str,Path]
RETURNS	The modified `SpanRuler` object. SpanRuler

SpanRuler.to_bytes 方法

将span ruler序列化为字节串。

名称	描述
返回值	序列化后的模式。bytes

SpanRuler.from_bytes 方法

从字节串加载管道。原地修改对象并返回它。

名称	描述
`bytes_data`	The bytestring to load. bytes
RETURNS	The modified `SpanRuler` object. SpanRuler

SpanRuler.labels 属性

匹配模式中存在的所有标签。

名称	描述
返回值	字符串标签。元组[字符串, …]

SpanRuler.ids 属性

匹配模式中id属性存在的所有ID。

名称	描述
返回值	字符串ID。元组[字符串, …]

SpanRuler.patterns 属性

所有添加到跨度规则器的模式。

名称	描述
返回值	原始模式，每个模式对应一个字典。列表[字典[字符串, 联合类型[字符串, 字典]]]

属性

名称	描述
`key`	The spans key that spans are saved under. Optional[str]
`matcher`	The underlying matcher used to process token patterns. Matcher
`phrase_matcher`	The underlying phrase matcher used to process phrase patterns. PhraseMatcher

建议编辑

流水线