流水线

SpanRuler

classv3.3
String name:span_rulerTrainable:
基于规则的跨度和命名实体识别的流水线组件

span ruler 允许您使用基于标记的规则或精确短语匹配,将跨度添加到 Doc.spans 和/或 Doc.ents。有关使用示例,请参阅关于 基于规则的跨度匹配的文档。

Assigned Attributes

匹配结果将被保存到Doc.spans[spans_key]作为SpanGroup和/或保存到Doc.ents,其中标注信息存储在Token.ent_typeToken.ent_iob字段中。

位置
Doc.spans[spans_key]The annotated spans. SpanGroup
Doc.entsThe annotated spans. Tuple[Span]
Token.ent_iobAn enum encoding of the IOB part of the named entity tag. int
Token.ent_iob_The IOB part of the named entity tag. str
Token.ent_typeThe label part of the named entity tag (hash). int
Token.ent_type_The label part of the named entity tag. str

配置与实现

默认配置由管道组件工厂定义,描述了组件应如何配置。您可以通过nlp.add_pipe上的config参数或在您的config.cfg中覆盖其设置。

设置描述
spans_keyThe spans key to save the spans under. If None, no spans are saved. Defaults to "ruler". Optional[str]
spans_filterThe optional method to filter spans before they are assigned to doc.spans. Defaults to None. Optional[Callable[[Iterable[Span], Iterable[Span]], List[Span]]]
annotate_entsWhether to save spans to doc.ents. Defaults to False. bool
ents_filterThe method to filter spans before they are assigned to doc.ents. Defaults to util.filter_chain_spans. Callable[[Iterable[Span], Iterable[Span]], List[Span]]
phrase_matcher_attrToken attribute to match on, passed to the internal PhraseMatcher as attr. Defaults to None. Optional[Union[int, str]]
matcher_fuzzy_compare v3.5The fuzzy comparison method, passed on to the internal Matcher. Defaults to spacy.matcher.levenshtein.levenshtein_compare. Callable
validateWhether patterns should be validated, passed to Matcher and PhraseMatcher as validate. Defaults to False. bool
overwriteWhether to remove any existing spans under Doc.spans[spans key] if spans_key is set, or to remove any ents under Doc.ents if annotate_ents is set. Defaults to True. bool
scorerThe scoring method. Defaults to Scorer.score_spans for Doc.spans[spans_key] with overlapping spans allowed. Optional[Callable]
explosion/spaCy/master/spacy/pipeline/span_ruler.py

SpanRuler.__init__ 方法

初始化span ruler。如果在此处提供了模式,它们需要是一个包含"label""pattern"键的字典列表。模式可以是词符模式(列表)或短语模式(字符串)。例如: {"label": "ORG", "pattern": "Apple"}

名称描述
nlpThe shared nlp object to pass the vocab to the matchers and process phrase patterns. Language
nameInstance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. Used to disable the current span ruler while creating phrase patterns with the nlp object. str
仅关键字
spans_keyThe spans key to save the spans under. If None, no spans are saved. Defaults to "ruler". Optional[str]
spans_filterThe optional method to filter spans before they are assigned to doc.spans. Defaults to None. Optional[Callable[[Iterable[Span], Iterable[Span]], List[Span]]]
annotate_entsWhether to save spans to doc.ents. Defaults to False. bool
ents_filterThe method to filter spans before they are assigned to doc.ents. Defaults to util.filter_chain_spans. Callable[[Iterable[Span], Iterable[Span]], List[Span]]
phrase_matcher_attrToken attribute to match on, passed to the internal PhraseMatcher as attr. Defaults to None. Optional[Union[int, str]]
matcher_fuzzy_compare v3.5The fuzzy comparison method, passed on to the internal Matcher. Defaults to spacy.matcher.levenshtein.levenshtein_compare. Callable
validateWhether patterns should be validated, passed to Matcher and PhraseMatcher as validate. Defaults to False. bool
overwriteWhether to remove any existing spans under Doc.spans[spans key] if spans_key is set, or to remove any ents under Doc.ents if annotate_ents is set. Defaults to True. bool
scorerThe scoring method. Defaults to Scorer.score_spans for Doc.spans[spans_key] with overlapping spans allowed. Optional[Callable]

SpanRuler.initialize 方法

初始化组件时使用数据,并在训练前用于从模式文件加载规则。该方法通常由Language.initialize调用,并允许您通过配置中的[initialize.components]块自定义接收的参数。初始化时会移除所有现有模式。

名称描述
get_examplesFunction that returns gold-standard annotations in the form of Example objects. Not used by the SpanRuler. Callable[[], Iterable[Example]]
仅关键字
nlpThe current nlp object. Defaults to None. Optional[Language]
patternsThe list of patterns. Defaults to None. Optional[Sequence[Dict[str, Union[str, List[Dict[str, Any]]]]]]

SpanRuler.__len__ 方法

添加到跨度标尺的所有模式的数量。

名称描述

SpanRuler.__contains__ 方法

标签是否存在于模式中。

名称描述
labelThe label to check. str

SpanRuler.__call__ 方法

Doc中查找匹配项并将其添加到doc.spans[span_key]和/或 doc.ents中。通常,在将该组件通过nlp.add_pipe添加到流程后, 这一过程会自动发生。如果span ruler初始化时设置了overwrite=True参数, 现有的spans和entities将被移除。

名称描述
docThe Doc object to process, e.g. the Doc in the pipeline. Doc

SpanRuler.add_patterns 方法

向span ruler添加模式。模式可以是词符模式(字典列表)或短语模式(字符串)。更多详情请参阅关于基于规则的匹配的使用指南。

名称描述
patternsThe patterns to add. List[Dict[str, Union[str, List[dict]]]]

SpanRuler.remove 方法

根据标签从span ruler中移除模式。如果标签不存在于任何模式中,将抛出ValueError错误。

名称描述
labelThe label of the pattern rule. str

SpanRuler.remove_by_id 方法

根据ID从span ruler中移除模式。如果ID在任何模式中不存在,则会引发ValueError错误。

名称描述
pattern_idThe ID of the pattern rule. str

SpanRuler.clear 方法

移除span ruler中的所有模式。

SpanRuler.to_disk 方法

将span ruler模式保存到目录中。这些模式将以换行符分隔的JSON格式(JSONL)保存。

名称描述
pathA path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects. Union[str,Path]

SpanRuler.from_disk 方法

从路径加载span ruler。

名称描述
pathA path to a directory. Paths may be either strings or Path-like objects. Union[str,Path]

SpanRuler.to_bytes 方法

将span ruler序列化为字节串。

名称描述

SpanRuler.from_bytes 方法

从字节串加载管道。原地修改对象并返回它。

名称描述
bytes_dataThe bytestring to load. bytes

SpanRuler.labels 属性

匹配模式中存在的所有标签。

名称描述

SpanRuler.ids 属性

匹配模式中id属性存在的所有ID。

名称描述

SpanRuler.patterns 属性

所有添加到跨度规则器的模式。

名称描述

属性

名称描述
keyThe spans key that spans are saved under. Optional[str]
matcherThe underlying matcher used to process token patterns. Matcher
phrase_matcherThe underlying phrase matcher used to process phrase patterns. PhraseMatcher