AttributeRuler

classv3

String name:attribute_rulerTrainable:

基于规则的词符属性赋值的流水线组件

属性规则器允许您为通过Matcher模式识别的词符设置属性。该规则器通常用于处理词符属性的例外情况，以及在属性之间映射值，例如将细粒度词性标签映射为粗粒度词性标签。具体示例请参阅使用指南。

配置与实现

默认配置由管道组件工厂定义，描述了组件应如何配置。您可以通过nlp.add_pipe上的config参数或在您的config.cfg训练配置中覆盖其设置。

设置	描述
`validate`	Whether patterns should be validated (passed to the `Matcher`). Defaults to `False`. bool

explosion/spaCy/master/spacy/pipeline/attributeruler.py

AttributeRuler.init 方法

初始化属性规则器。

名称	描述
`vocab`	The shared vocabulary to pass to the matcher. Vocab
`name`	Instance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. str
仅关键字
`validate`	Whether patterns should be validated (passed to the `Matcher`). Defaults to `False`. bool
`scorer`	The scoring method. Defaults to `Scorer.score_token_attr` for the attributes `"tag`“, `"pos"`, `"morph"` and `"lemma"` and `Scorer.score_token_attr_per_feat` for the attribute `"morph"`. Optional[Callable]

AttributeRuler.call 方法

将属性规则器应用于Doc，为匹配到指定模式的token设置token属性。

名称	描述
`doc`	The document to process. Doc
返回值	处理后的文档。Doc

AttributeRuler.add 方法

向属性规则器添加模式。这些模式是Matcher模式的列表，属性是要在匹配到的词符上设置的属性字典。如果模式匹配到多个词符的跨度，可以使用index来设置该跨度中对应索引位置的词符属性。index可以是负数，表示从跨度末尾开始索引。

名称	描述
`patterns`	The `Matcher` patterns to add. Iterable[List[Dict[Union[int, str], Any]]]
`attrs`	The attributes to assign to the target token in the matched span. Dict[str, Any]
`index`	The index of the token in the matched span to modify. May be negative to index from the end of the span. Defaults to `0`. int

AttributeRuler.add_patterns 方法

从模式字典列表中添加模式。每个模式字典可以指定键"patterns"、"attrs"和"index"，这些键与AttributeRuler.add的参数相匹配。

名称	描述
`patterns`	The patterns to add. Iterable[Dict[str, Union[List[dict], dict, int]]]

AttributeRuler.patterns 属性

获取所有已添加到属性规则器中的模式，这些模式采用AttributeRuler.add_patterns接受的patterns_dict格式。

名称	描述
返回值	添加到属性规则器的模式列表。List[Dict[str, Union[List[dict], dict, int]]]

AttributeRuler.initialize 方法

初始化组件时加载数据，并在训练前用于从文件加载规则。该方法通常由Language.initialize调用，并允许您通过配置中的[initialize.components]块来自定义接收的参数。

名称	描述
`get_examples`	Function that returns gold-standard annotations in the form of `Example` objects (the training data). Not used by this component. Callable[[], Iterable[Example]]
仅关键字
`nlp`	The current `nlp` object. Defaults to `None`. Optional[Language]
`patterns`	A list of pattern dicts with the keys as the arguments to `AttributeRuler.add` (`patterns`/`attrs`/`index`) to add as patterns. Defaults to `None`. Optional[Iterable[Dict[str, Union[List[dict], dict, int]]]]
`tag_map`	The tag map that maps fine-grained tags to coarse-grained tags and morphological features. Defaults to `None`. Optional[Dict[str, Dict[Union[int, str], Union[int, str]]]]
`morph_rules`	The morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. Defaults to `None`. Optional[Dict[str, Dict[str, Dict[Union[int, str], Union[int, str]]]]]

AttributeRuler.load_from_tag_map 方法

从标签映射加载属性规则器模式。

名称	描述
`tag_map`	The tag map that maps fine-grained tags to coarse-grained tags and morphological features. Dict[str, Dict[Union[int, str], Union[int, str]]]

AttributeRuler.load_from_morph_rules 方法

从形态规则加载属性标尺模式。

名称	描述
`morph_rules`	The morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. Dict[str, Dict[str, Dict[Union[int, str], Union[int, str]]]]

AttributeRuler.to_disk 方法

将管道序列化到磁盘。

名称	描述
`path`	A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or `Path`-like objects. Union[str,Path]
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]

AttributeRuler.from_disk 方法

从磁盘加载管道。就地修改对象并返回它。

名称	描述
`path`	A path to a directory. Paths may be either strings or `Path`-like objects. Union[str,Path]
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]
RETURNS	The modified `AttributeRuler` object. AttributeRuler

AttributeRuler.to_bytes 方法

将管道序列化为字节串。

名称	描述
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]
RETURNS	The serialized form of the `AttributeRuler` object. bytes

AttributeRuler.from_bytes 方法

从字节串加载管道。原地修改对象并返回它。

名称	描述
`bytes_data`	The data to load from. bytes
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]
RETURNS	The `AttributeRuler` object. AttributeRuler

序列化字段

在序列化过程中，spaCy会导出多个用于恢复对象不同方面的数据字段。如果需要，您可以通过exclude参数传入字符串名称来将它们排除在序列化之外。

名称	描述
`vocab`	The shared `Vocab`.
`patterns`	The `Matcher` patterns. You usually don’t want to exclude this.
`attrs`	The attributes to set. You usually don’t want to exclude this.
`indices`	The token indices. You usually don’t want to exclude this.

建议编辑

流水线