句子分割器

class

String name:sentencizerTrainable:

基于规则的句子边界检测流水线组件

一个简单的管道组件，用于实现不依赖依存句法分析的自定义句子边界检测逻辑。默认情况下，句子分割由DependencyParser执行，因此Sentencizer允许您实现更简单的基于规则的策略，无需加载统计模型。

Assigned Attributes

计算后的值将被赋给Token.is_sent_start。生成的句子可以通过Doc.sents访问。

位置	值
`Token.is_sent_start`	A boolean value indicating whether the token starts a sentence. This will be either `True` or `False` for all tokens. bool
`Doc.sents`	An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. Iterator[Span]

默认配置由管道组件工厂定义，描述了组件应如何配置。您可以通过nlp.add_pipe上的config参数或在您的config.cfg训练配置中覆盖其设置。

设置	描述
`punct_chars`	Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. Optional[List[str]]
`overwrite` v3.2	Whether existing annotation is overwritten. Defaults to `False`. bool
`scorer` v3.2	The scoring method. Defaults to `Scorer.score_spans` for the attribute `"sents"` Optional[Callable]

explosion/spaCy/master/spacy/pipeline/sentencizer.pyx

初始化句子分割器。

名称	描述
仅关键字
`punct_chars`	Optional custom list of punctuation characters that mark sentence ends. See below for defaults. Optional[List[str]]
`overwrite` v3.2	Whether existing annotation is overwritten. Defaults to `False`. bool
`scorer` v3.2	The scoring method. Defaults to `Scorer.score_spans` for the attribute `"sents"` Optional[Callable]

punct_chars 默认值

在Doc上应用句子分割器。通常，在通过nlp.add_pipe将该组件添加到处理管道后，这个过程会自动完成。

名称	描述
`doc`	The `Doc` object to process, e.g. the `Doc` in the pipeline. Doc
RETURNS	The modified `Doc` with added sentence boundaries. Doc

将管道应用于文档流。这通常在调用nlp对象处理文本时自动完成，所有流水线组件会按顺序应用于Doc。

名称	描述
`stream`	A stream of documents. Iterable[Doc]
仅关键字
`batch_size`	The number of documents to buffer. Defaults to `128`. int
YIELDS	按顺序处理后的文档。Doc

将句子分割器设置（标点符号字符）保存到目录。将创建一个文件sentencizer.json。当您保存添加了句子分割器的nlp对象到其管道时，这也会自动发生。

名称	描述
`path`	A path to a JSON file, which will be created if it doesn’t exist. Paths may be either strings or `Path`-like objects. Union[str,Path]

从文件加载句子分割器设置。需要一个JSON文件。当你加载一个nlp对象或模型时，如果其管道中添加了句子分割器，这个过程也会自动发生。

名称	描述
`path`	A path to a JSON file. Paths may be either strings or `Path`-like objects. Union[str,Path]
RETURNS	The modified `Sentencizer` object. Sentencizer