流水线

AttributeRuler

classv3
String name:attribute_rulerTrainable:
基于规则的词符属性赋值的流水线组件

属性规则器允许您为通过Matcher模式识别的词符设置属性。该规则器通常用于处理词符属性的例外情况,以及在属性之间映射值,例如将细粒度词性标签映射为粗粒度词性标签。具体示例请参阅使用指南

配置与实现

默认配置由管道组件工厂定义,描述了组件应如何配置。您可以通过nlp.add_pipe上的config参数或在您的config.cfg训练配置中覆盖其设置。

设置描述
validateWhether patterns should be validated (passed to the Matcher). Defaults to False. bool
explosion/spaCy/master/spacy/pipeline/attributeruler.py

AttributeRuler.__init__ 方法

初始化属性规则器。

名称描述
vocabThe shared vocabulary to pass to the matcher. Vocab
nameInstance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. str
仅关键字
validateWhether patterns should be validated (passed to the Matcher). Defaults to False. bool
scorerThe scoring method. Defaults to Scorer.score_token_attr for the attributes "tag“, "pos", "morph" and "lemma" and Scorer.score_token_attr_per_feat for the attribute "morph". Optional[Callable]

AttributeRuler.__call__ 方法

将属性规则器应用于Doc,为匹配到指定模式的token设置token属性。

名称描述
docThe document to process. Doc

AttributeRuler.add 方法

向属性规则器添加模式。这些模式是Matcher模式的列表,属性是要在匹配到的词符上设置的属性字典。如果模式匹配到多个词符的跨度,可以使用index来设置该跨度中对应索引位置的词符属性。index可以是负数,表示从跨度末尾开始索引。

名称描述
patternsThe Matcher patterns to add. Iterable[List[Dict[Union[int, str], Any]]]
attrsThe attributes to assign to the target token in the matched span. Dict[str, Any]
indexThe index of the token in the matched span to modify. May be negative to index from the end of the span. Defaults to 0. int

AttributeRuler.add_patterns 方法

从模式字典列表中添加模式。每个模式字典可以指定键"patterns""attrs""index",这些键与AttributeRuler.add的参数相匹配。

名称描述
patternsThe patterns to add. Iterable[Dict[str, Union[List[dict], dict, int]]]

AttributeRuler.patterns 属性

获取所有已添加到属性规则器中的模式,这些模式采用AttributeRuler.add_patterns接受的patterns_dict格式。

名称描述

AttributeRuler.initialize 方法

初始化组件时加载数据,并在训练前用于从文件加载规则。该方法通常由Language.initialize调用,并允许您通过配置中的[initialize.components]块来自定义接收的参数。

名称描述
get_examplesFunction that returns gold-standard annotations in the form of Example objects (the training data). Not used by this component. Callable[[], Iterable[Example]]
仅关键字
nlpThe current nlp object. Defaults to None. Optional[Language]
patternsA list of pattern dicts with the keys as the arguments to AttributeRuler.add (patterns/attrs/index) to add as patterns. Defaults to None. Optional[Iterable[Dict[str, Union[List[dict], dict, int]]]]
tag_mapThe tag map that maps fine-grained tags to coarse-grained tags and morphological features. Defaults to None. Optional[Dict[str, Dict[Union[int, str], Union[int, str]]]]
morph_rulesThe morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. Defaults to None. Optional[Dict[str, Dict[str, Dict[Union[int, str], Union[int, str]]]]]

AttributeRuler.load_from_tag_map 方法

从标签映射加载属性规则器模式。

名称描述
tag_mapThe tag map that maps fine-grained tags to coarse-grained tags and morphological features. Dict[str, Dict[Union[int, str], Union[int, str]]]

AttributeRuler.load_from_morph_rules 方法

从形态规则加载属性标尺模式。

名称描述
morph_rulesThe morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. Dict[str, Dict[str, Dict[Union[int, str], Union[int, str]]]]

AttributeRuler.to_disk 方法

将管道序列化到磁盘。

名称描述
pathA path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path-like objects. Union[str,Path]
仅关键字
excludeString names of serialization fields to exclude. Iterable[str]

AttributeRuler.from_disk 方法

从磁盘加载管道。就地修改对象并返回它。

名称描述
pathA path to a directory. Paths may be either strings or Path-like objects. Union[str,Path]
仅关键字
excludeString names of serialization fields to exclude. Iterable[str]

AttributeRuler.to_bytes 方法

将管道序列化为字节串。

名称描述
仅关键字
excludeString names of serialization fields to exclude. Iterable[str]

AttributeRuler.from_bytes 方法

从字节串加载管道。原地修改对象并返回它。

名称描述
bytes_dataThe data to load from. bytes
仅关键字
excludeString names of serialization fields to exclude. Iterable[str]

序列化字段

在序列化过程中,spaCy会导出多个用于恢复对象不同方面的数据字段。如果需要,您可以通过exclude参数传入字符串名称来将它们排除在序列化之外。

名称描述
vocabThe shared Vocab.
patternsThe Matcher patterns. You usually don’t want to exclude this.
attrsThe attributes to set. You usually don’t want to exclude this.
indicesThe token indices. You usually don’t want to exclude this.