跨度

class

从Doc对象中截取的一段。

Span.init 方法

从切片 doc[start : end] 创建一个 Span 对象。

名称	描述
`doc`	The parent document. Doc
`start`	The index of the first token of the span. int
`end`	The index of the first token after the span. int
`label`	A label to attach to the span, e.g. for named entities. Union[str, int]
`vector`	A meaning representation of the span. numpy.ndarray[ndim=1, dtype=float32]
`vector_norm`	The L2 norm of the document’s vector representation. float
`kb_id`	A knowledge base ID to attach to the span, e.g. for named entities. Union[str, int]
`span_id`	An ID to associate with the span. Union[str, int]

Span.getitem 方法

获取一个Token对象。

名称	描述
`i`	The index of the token within the span. int
RETURNS	The token at `span[i]`. Token

获取一个Span对象。

名称	描述
`start_end`	The slice of the span to get. Tuple[int, int]
RETURNS	The span at `span[start : end]`. Span

Span.iter 方法

遍历Token对象。

名称	描述
YIELDS	A `Token` object. Token

Span.len 方法

获取该span中的token数量。

名称	描述
返回值	该span中的token数量。int

Span.set_extension 类方法

在Span上定义一个自定义属性，该属性可通过Span._访问。详情请参阅关于 custom attributes的文档。

名称	描述
`name`	Name of the attribute to set by the extension. For example, `"my_attr"` will be available as `span._.my_attr`. str
`default`	Optional default value of the attribute if no getter or method is defined. Optional[Any]
`method`	Set a custom method on the object, for example `span._.compare(other_span)`. Optional[Callable[[Span, …], Any]]
`getter`	Getter function that takes the object and returns an attribute value. Is called when the user accesses the `._` attribute. Optional[Callable[[Span], Any]]
`setter`	Setter function that takes the `Span` and a value, and modifies the object. Is called when the user writes to the `Span._` attribute. Optional[Callable[[Span, Any], None]]
`force`	Force overwriting existing attribute. bool

Span.get_extension 类方法

通过名称查找先前注册的扩展。如果扩展已注册，则返回一个4元组(default, method, getter, setter)。否则抛出KeyError。

名称	描述
`name`	Name of the extension. str
RETURNS	A `(default, method, getter, setter)` tuple of the extension. Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]

Span.has_extension 类方法

检查扩展是否已在Span类上注册。

名称	描述
`name`	Name of the extension to check. str
返回值	扩展是否已注册。bool

Span.remove_extension 类方法

移除之前注册的扩展。

名称	描述
`name`	Name of the extension. str
RETURNS	A `(default, method, getter, setter)` tuple of the removed extension. Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]

Span.char_span 方法

从切片 span.text[start:end] 创建一个 Span 对象。如果字符索引未映射到有效范围，则返回 None。

名称	描述
`start`	The index of the first character of the span. int
`end`	The index of the last character after the span. int
`label`	A label to attach to the span, e.g. for named entities. Union[int, str]
`kb_id`	An ID from a knowledge base to capture the meaning of a named entity. Union[int, str]
`vector`	A meaning representation of the span. numpy.ndarray[ndim=1, dtype=float32]
`id`	Unused. Union[int, str]
`alignment_mode` v3.5.1	How character indices snap to token boundaries. Options: `"strict"` (no snapping), `"contract"` (span of all tokens completely within the character span), `"expand"` (span of all tokens at least partially covered by the character span). Defaults to `"strict"`. str
`span_id` v3.5.1	An identifier to associate with the span. Union[int, str]
RETURNS	The newly constructed object or `None`. Optional[Span]

Span.similarity 方法需要模型

进行语义相似度估算。默认估算方法是使用词向量平均值的余弦相似度。

名称	描述
`other`	The object to compare with. By default, accepts `Doc`, `Span`, `Token` and `Lexeme` objects. Union[Doc,Span,Token,Lexeme]
返回值	一个标量相似度分数。数值越高表示越相似。float

Span.get_lca_matrix 方法

计算给定Span的最低公共祖先矩阵。返回包含祖先整数索引的LCA矩阵，如果未找到公共祖先（例如当span排除了必要祖先时）则返回-1。

名称	描述
RETURNS	The lowest common ancestor matrix of the `Span`. numpy.ndarray[ndim=2, dtype=int32]

Span.to_array 方法

给定一个包含M个属性ID的列表，将这些标记导出为一个形状为(N, M)的numpy ndarray数组，其中N是文档的长度。这些值将是32位整数。

名称	描述
`attr_ids`	A list of attributes (int IDs or string names) or a single attribute (int ID or string name). Union[int, str, List[Union[int, str]]]
返回值	导出的属性作为numpy数组。Union[numpy.ndarray[ndim=2, dtype=uint64],numpy.ndarray[ndim=1, dtype=uint64]]

Span.ents 属性需要模型

完全位于该范围内的命名实体。返回一个由Span对象组成的元组。

名称	描述
RETURNS	Entities in the span, one `Span` per entity. Tuple[Span, …]

Span.noun_chunks 属性需要模型

遍历该范围内的基本名词短语。如果文档已经过句法分析，则生成基本名词短语Span对象。基本名词短语或称"NP块"，是指不允许其他NP嵌套其中的名词短语——因此不包含NP层级的并列结构、介词短语和关系从句。

如果给定语言的noun_chunk syntax iterator尚未实现，则会抛出NotImplementedError错误。

名称	描述
YIELDS	返回该span范围内的名词块。Span

Span.as_doc 方法

创建一个新的Doc对象，对应Span，并包含数据的副本。

当对同一文档中的多个跨度调用此方法时，通过使用array_head和array参数传入预先计算好的文档数组表示形式可以节省时间。

名称	描述
`copy_user_data`	Whether or not to copy the original doc’s user data. bool
`array_head`	Precomputed array attributes (headers) of the original doc, as generated by `Doc._get_array_attrs()`. Tuple
`array`	Precomputed array version of the original doc as generated by `Doc.to_array`. numpy.ndarray
RETURNS	A `Doc` object of the `Span`’s content. Doc

Span.root 属性需要模型

与句子根节点路径最短的词符（或根节点本身）。如果多个词符在树中的高度相同，则选择第一个词符。

名称	描述
返回值	根词元。Token

Span.conjuncts 属性需要模型

与span.root协调的token元组。

名称	描述
返回值	协调后的词元。元组[Token, …]

Span.lefts 属性需要模型

位于span左侧且其头部在span内的Tokens。

名称	描述
YIELDS	该span中某个token的左子节点。Token

Span.rights 属性需要模型

位于span右侧且其头部在span内的Tokens。

名称	描述
YIELDS	该span中某个token的右子节点。Token

Span.n_lefts 属性需要模型

位于该跨度左侧且其头部在该跨度内的标记数量。

名称	描述
返回值	左子标记的数量。int

Span.n_rights 属性需要模型

位于该span右侧且其头部在该span内的token数量。

名称	描述
返回值	右子代标记的数量。int

Span.subtree 属性需要模型

该跨度内的标记及其派生标记。

名称	描述
YIELDS	返回该span范围内的一个token，或其派生token。Token

Span.has_vector 属性需要模型

一个布尔值，表示该对象是否关联了词向量。

名称	描述
RETURNS	该span是否附加了向量数据。bool

Span.vector 属性需要模型

一个实值意义表示。默认为词符向量的平均值。

名称	描述
返回值	一个表示该span向量的一维数组。`numpy.ndarray[ndim=1, dtype=float32]

Span.vector_norm 属性需要模型

该跨度向量表示的L2范数。

名称	描述
返回值	向量表示的L2范数。float

Span.sent 属性需要模型

该span所属的句子范围。此属性仅在文档通过parser、senter、sentencizer或某些自定义函数设置了句子边界时才可用，否则会引发错误。

如果该跨度恰好跨越句子边界，则只会返回第一个句子。如果需要确保句子始终包含完整的跨度，可以按如下方式调整结果：

名称	描述
返回值	该文本片段所属的完整句子范围。Span

Span.sents 属性v3.2.1需要模型

返回一个生成器，用于遍历该span所属的句子。该属性仅在文档通过parser、senter、sentencizer或某些自定义函数设置了句子边界时才可用，否则会引发错误。

如果该跨度恰好跨越句子边界，将返回与该跨度重叠的所有句子。

名称	描述
RETURNS	A generator yielding sentences this `Span` is a part of Iterable[Span]

属性

名称	描述
`doc`	The parent document. Doc
`tensor`	The span’s slice of the parent `Doc`’s tensor. numpy.ndarray
`start`	The token offset for the start of the span. int
`end`	The token offset for the end of the span. int
`start_char`	The character offset for the start of the span. int
`end_char`	The character offset for the end of the span. int
`text`	A string representation of the span text. str
`text_with_ws`	The text content of the span with a trailing whitespace character if the last token has one. str
`orth`	ID of the verbatim text content. int
`orth_`	Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes. str
`label`	The hash value of the span’s label. int
`label_`	The span’s label. str
`lemma_`	The span’s lemma. Equivalent to `"".join(token.text_with_ws for token in span)`. str
`kb_id`	The hash value of the knowledge base ID referred to by the span. int
`kb_id_`	The knowledge base ID referred to by the span. str
`ent_id`	The hash value of the named entity the root token is an instance of. int
`ent_id_`	The string ID of the named entity the root token is an instance of. str
`id`	The hash value of the span’s ID. int
`id_`	The span’s ID. str
`sentiment`	A scalar value indicating the positivity or negativity of the span. float
`_`	User space for adding custom attribute extensions. Underscore

建议编辑

容器

Span.__init__ 方法

Span.__getitem__ 方法

Span.__iter__ 方法

Span.__len__ 方法