Cython

Cython类

Doc cdef class

Doc对象包含一个TokenC结构体数组。

属性

名称描述
memA memory pool. Allocated memory will be freed once the Doc object is garbage collected. cymem.Pool
vocabA reference to the shared Vocab object. Vocab
cA pointer to a TokenC struct. TokenC*
lengthThe number of tokens in the document. int
max_lengthThe underlying size of the Doc.c array. int

Doc.push_back 方法

Doc追加一个词元。该词元可以通过Cython的融合类型,以LexemeCTokenC指针的形式提供。

名称描述
lex_or_tokThe word to append to the Doc. LexemeOrToken
has_spaceWhether the word has trailing whitespace. bint

Token cdef class

一个Cython类,提供对TokenC结构的访问和方法。请注意,Token对象并不拥有该结构体。它只接收一个指向该结构的指针。

属性

名称描述
vocabA reference to the shared Vocab object. Vocab
cA pointer to a TokenC struct. TokenC*
iThe offset of the token within the document. int
docThe parent document. Doc

Token.cinit 方法

TokenC*指针创建一个Token对象。

名称描述
vocabA reference to the shared Vocab. Vocab
cA pointer to a TokenC struct. TokenC*
offsetThe offset of the token within the document. int
docThe parent document. int

Span cdef 类

一个Cython类,提供对Doc对象切片的访问和方法。

属性

名称描述
docThe parent document. Doc
startThe index of the first token of the span. int
endThe index of the first token after the span. int
start_charThe index of the first character of the span. int
end_charThe index of the last character of the span. int
labelA label to attach to the span, e.g. for named entities. attr_t (uint64_t)

词素 cdef class

一个Cython类,提供对词汇表中条目的访问和方法。

属性

名称描述
cA pointer to a LexemeC struct. LexemeC*
vocabA reference to the shared Vocab object. Vocab
orthID of the verbatim text content. attr_t (uint64_t)

词汇表 cdef class

一个Cython类,提供对词汇表及其他跨语言共享数据的访问和方法。

属性

名称描述
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
stringsA StringStore that maps string to hash values and vice versa. StringStore
lengthThe number of entries in the vocabulary. int

Vocab.get 方法

从词汇表中检索一个LexemeC*指针。

名称描述
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
stringThe string of the word to look up. str

Vocab.get_by_orth 方法

从词汇表中获取一个LexemeC*指针。

名称描述
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
orthID of the verbatim text content. attr_t (uint64_t)

StringStore cdef类

一个通过64位哈希值检索字符串的查找表。

属性

名称描述
memA memory pool. Allocated memory will be freed once the StringStore object is garbage collected. cymem.Pool
keysA list of hash values in the StringStore. vector[hash_t] (vector[uint64_t])