Cython Classes · spaCy API Documentation

Doc cdef class
Source

Doc对象包含一个TokenC结构体数组。

属性

名称	描述
`mem`	A memory pool. Allocated memory will be freed once the `Doc` object is garbage collected. cymem.Pool
`vocab`	A reference to the shared `Vocab` object. Vocab
`c`	A pointer to a `TokenC` struct. TokenC*
`length`	The number of tokens in the document. int
`max_length`	The underlying size of the `Doc.c` array. int

Doc.push_back 方法

向Doc追加一个词元。该词元可以通过Cython的融合类型，以LexemeC或TokenC指针的形式提供。

名称	描述
`lex_or_tok`	The word to append to the `Doc`. LexemeOrToken
`has_space`	Whether the word has trailing whitespace. bint

Token cdef class
Source

一个Cython类，提供对TokenC结构的访问和方法。请注意，Token对象并不拥有该结构体。它只接收一个指向该结构的指针。

属性

名称	描述
`vocab`	A reference to the shared `Vocab` object. Vocab
`c`	A pointer to a `TokenC` struct. TokenC*
`i`	The offset of the token within the document. int
`doc`	The parent document. Doc

Token.cinit 方法

从TokenC*指针创建一个Token对象。

名称	描述
`vocab`	A reference to the shared `Vocab`. Vocab
`c`	A pointer to a `TokenC` struct. TokenC*
`offset`	The offset of the token within the document. int
`doc`	The parent document. int

Span cdef 类
Source

一个Cython类，提供对Doc对象切片的访问和方法。

属性

名称	描述
`doc`	The parent document. Doc
`start`	The index of the first token of the span. int
`end`	The index of the first token after the span. int
`start_char`	The index of the first character of the span. int
`end_char`	The index of the last character of the span. int
`label`	A label to attach to the span, e.g. for named entities. attr_t (uint64_t)

词素 cdef class
Source

一个Cython类，提供对词汇表中条目的访问和方法。

属性

名称	描述
`c`	A pointer to a `LexemeC` struct. LexemeC*
`vocab`	A reference to the shared `Vocab` object. Vocab
`orth`	ID of the verbatim text content. attr_t (uint64_t)

词汇表 cdef class
Source

一个Cython类，提供对词汇表及其他跨语言共享数据的访问和方法。

属性

名称	描述
`mem`	A memory pool. Allocated memory will be freed once the `Vocab` object is garbage collected. cymem.Pool
`strings`	A `StringStore` that maps string to hash values and vice versa. StringStore
`length`	The number of entries in the vocabulary. int

Vocab.get 方法

从词汇表中检索一个LexemeC*指针。

名称	描述
`mem`	A memory pool. Allocated memory will be freed once the `Vocab` object is garbage collected. cymem.Pool
`string`	The string of the word to look up. str
返回值	词汇表中的词位。const LexemeC*

Vocab.get_by_orth 方法

从词汇表中获取一个LexemeC*指针。

名称	描述
`mem`	A memory pool. Allocated memory will be freed once the `Vocab` object is garbage collected. cymem.Pool
`orth`	ID of the verbatim text content. attr_t (uint64_t)
返回值	词汇表中的词位。const LexemeC*

StringStore cdef类
Source

一个通过64位哈希值检索字符串的查找表。

属性

名称	描述
`mem`	A memory pool. Allocated memory will be freed once the `StringStore` object is garbage collected. cymem.Pool
`keys`	A list of hash values in the `StringStore`. vector[hash_t] (vector[uint64_t])

Cython

Doc cdef classSource

属性

Doc.push_back 方法

Token cdef classSource

属性

Token.cinit 方法

Span cdef 类Source

属性

词素 cdef classSource

属性

词汇表 cdef classSource

属性