SpanFinder

class,experimentalv3.6

String name:span_finderTrainable:

用于识别文本中可能重叠跨度的流水线组件

跨度查找器用于识别可能存在重叠且未标记的文本跨度。它能识别标记跨度的起始和结束位置，并在这些起止点之间标注未标记的跨度，同时提供可选的最小和最大跨度长度过滤器。该功能设计用于与类似SpanCategorizer的组件配合使用，后者可对这些跨度进行进一步筛选或标记。预测的跨度将被保存在文档的SpanGroup中，存储路径为doc.spans[spans_key]，其中spans_key是组件的配置参数。

Assigned Attributes

预测结果将保存到Doc.spans[spans_key]中，作为SpanGroup。

spans_key 默认为 "sc"，但可以作为参数传递。span_finder 组件将覆盖 spans 键 doc.spans[spans_key] 下的任何现有 spans。

位置	值
`Doc.spans[spans_key]`	The unlabeled spans. SpanGroup

配置与实现

默认配置由管道组件工厂定义，描述了组件应如何配置。您可以通过nlp.add_pipe中的config参数或在训练用的config.cfg中覆盖其设置。有关架构及其参数和超参数的详细信息，请参阅模型架构文档。

设置	描述
`model`	A model instance that is given a list of documents and predicts a probability for each token. Model[List[Doc],Floats2d]
`spans_key`	Key of the `Doc.spans` dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. str
`threshold`	Minimum probability to consider a prediction positive. Defaults to `0.5`. float
`max_length`	Maximum length of the produced spans, defaults to `25`. Optional[int]
`min_length`	Minimum length of the produced spans, defaults to `None` meaning shortest span length is 1. Optional[int]
`scorer`	The scoring method. Defaults to `Scorer.score_spans` for `Doc.spans[spans_key]` with overlapping spans allowed. Optional[Callable]

explosion/spaCy/master/spacy/pipeline/span_finder.py

SpanFinder.init 方法

创建一个新的管道实例。在您的应用程序中，通常会使用快捷方式，通过其字符串名称并使用nlp.add_pipe来实例化该组件。

名称	描述
`vocab`	The shared vocabulary. Vocab
`model`	A model instance that is given a list of documents and predicts a probability for each token. Model[List[Doc],Floats2d]
`name`	String name of the component instance. Used to add entries to the `losses` during training. str
仅关键字
`spans_key`	Key of the `Doc.spans` dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. str
`threshold`	Minimum probability to consider a prediction positive. Defaults to `0.5`. float
`max_length`	Maximum length of the produced spans, defaults to `None` meaning unlimited length. Optional[int]
`min_length`	Minimum length of the produced spans, defaults to `None` meaning shortest span length is 1. Optional[int]
`scorer`	The scoring method. Defaults to `Scorer.score_spans` for `Doc.spans[spans_key]` with overlapping spans allowed. Optional[Callable]

SpanFinder.call 方法

将管道应用于单个文档。文档会被原地修改并返回。这通常在调用nlp对象处理文本时自动完成，所有管道组件会按顺序应用于Doc对象。 __call__和pipe方法都会委托给 predict和set_annotations方法执行。

名称	描述
`doc`	The document to process. Doc
返回值	处理后的文档。Doc

SpanFinder.pipe 方法

将管道应用于文档流。这通常在调用nlp对象处理文本时自动完成，所有管道组件会按顺序应用于Doc对象。无论是__call__还是pipe方法，最终都会委托给predict和set_annotations方法执行。

名称	描述
`stream`	A stream of documents. Iterable[Doc]
仅关键字
`batch_size`	The number of documents to buffer. Defaults to `128`. int
YIELDS	按顺序处理后的文档。Doc

SpanFinder.initialize 方法

初始化组件以进行训练。get_examples应为一个返回可迭代Example对象的函数。至少需要提供一个示例。这些数据示例用于初始化组件模型，可以是完整的训练数据或代表性样本。初始化过程包括验证网络和推断缺失形状。该方法通常由Language.initialize调用，并允许您通过配置中的[initialize.components]块来自定义接收的参数。

名称	描述
`get_examples`	Function that returns gold-standard annotations in the form of `Example` objects. Must contain at least one `Example`. Callable[[], Iterable[Example]]
仅关键字
`nlp`	The current `nlp` object. Defaults to `None`. Optional[Language]

SpanFinder.predict 方法

在不修改的情况下，将组件的模型应用于一批Doc对象。

名称	描述
`docs`	The documents to predict. Iterable[Doc]
返回值	模型对每个文档的预测结果。

SpanFinder.set_annotations 方法

使用预先计算的分数批量修改Doc对象。

名称	描述
`docs`	The documents to modify. Iterable[Doc]
`scores`	The scores to set, produced by `SpanFinder.predict`.

SpanFinder.update 方法

从一批包含预测和黄金标准注释的Example对象中学习，并更新组件的模型。委托给predict和get_loss。

名称	描述
`examples`	A batch of `Example` objects to learn from. Iterable[Example]
仅关键字
`drop`	The dropout rate. float
`sgd`	An optimizer. Will be created via `create_optimizer` if not set. Optional[Optimizer]
`losses`	Optional record of the loss during training. Updated using the component name as the key. Optional[Dict[str, float]]
RETURNS	The updated `losses` dictionary. Dict[str, float]

SpanFinder.get_loss 方法

计算这批文档及其预测分数的损失和损失梯度。

名称	描述
`examples`	The batch of examples. Iterable[Example]
`spans_scores`	Scores representing the model’s predictions. Tuple[Ragged,Floats2d]
RETURNS	The loss and the gradient, i.e. `(loss, gradient)`. Tuple[float,Floats2d]

SpanFinder.create_optimizer 方法

为管道组件创建一个优化器。

名称	描述
返回值	优化器。Optimizer

SpanFinder.use_params 方法上下文管理器

修改管道的模型以使用给定的参数值。

名称	描述
`params`	The parameter values to use in the model. dict

SpanFinder.to_disk 方法

将管道序列化到磁盘。

名称	描述
`path`	A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or `Path`-like objects. Union[str,Path]
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]

SpanFinder.from_disk 方法

从磁盘加载管道。就地修改对象并返回它。

名称	描述
`path`	A path to a directory. Paths may be either strings or `Path`-like objects. Union[str,Path]
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]
RETURNS	The modified `SpanFinder` object. SpanFinder

SpanFinder.to_bytes 方法

将管道序列化为字节串。

名称	描述
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]
RETURNS	The serialized form of the `SpanFinder` object. bytes

SpanFinder.from_bytes 方法

从字节串加载管道。原地修改对象并返回它。

名称	描述
`bytes_data`	The data to load from. bytes
仅关键字
`exclude`	String names of serialization fields to exclude. Iterable[str]
RETURNS	The `SpanFinder` object. SpanFinder

序列化字段

在序列化过程中，spaCy会导出多个用于恢复对象不同方面的数据字段。如果需要，您可以通过exclude参数传入字符串名称来将它们排除在序列化之外。

名称	描述
`vocab`	The shared `Vocab`.
`cfg`	The config file. You usually don’t want to exclude this.
`model`	The binary model data. You usually don’t want to exclude this.

建议编辑

流水线