Cython类
Doc cdef class
Doc对象包含一个TokenC结构体数组。
属性
| 名称 | 描述 |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Doc object is garbage collected. cymem.Pool |
vocab | A reference to the shared Vocab object. Vocab |
c | A pointer to a TokenC struct. TokenC* |
length | The number of tokens in the document. int |
max_length | The underlying size of the Doc.c array. int |
Doc.push_back 方法
向Doc追加一个词元。该词元可以通过Cython的融合类型,以LexemeC或TokenC指针的形式提供。
| 名称 | 描述 |
|---|---|
lex_or_tok | The word to append to the Doc. LexemeOrToken |
has_space | Whether the word has trailing whitespace. bint |
Token cdef class
一个Cython类,提供对TokenC结构的访问和方法。请注意,Token对象并不拥有该结构体。它只接收一个指向该结构的指针。
属性
| 名称 | 描述 |
|---|---|
vocab | A reference to the shared Vocab object. Vocab |
c | A pointer to a TokenC struct. TokenC* |
i | The offset of the token within the document. int |
doc | The parent document. Doc |
Token.cinit 方法
从TokenC*指针创建一个Token对象。
| 名称 | 描述 |
|---|---|
vocab | A reference to the shared Vocab. Vocab |
c | A pointer to a TokenC struct. TokenC* |
offset | The offset of the token within the document. int |
doc | The parent document. int |
Span cdef 类
一个Cython类,提供对Doc对象切片的访问和方法。
属性
| 名称 | 描述 |
|---|---|
doc | The parent document. Doc |
start | The index of the first token of the span. int |
end | The index of the first token after the span. int |
start_char | The index of the first character of the span. int |
end_char | The index of the last character of the span. int |
label | A label to attach to the span, e.g. for named entities. attr_t |
词素 cdef class
一个Cython类,提供对词汇表中条目的访问和方法。
属性
| 名称 | 描述 |
|---|---|
c | A pointer to a LexemeC struct. LexemeC* |
vocab | A reference to the shared Vocab object. Vocab |
orth | ID of the verbatim text content. attr_t |
词汇表 cdef class
一个Cython类,提供对词汇表及其他跨语言共享数据的访问和方法。
属性
| 名称 | 描述 |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool |
strings | A StringStore that maps string to hash values and vice versa. StringStore |
length | The number of entries in the vocabulary. int |
Vocab.get 方法
从词汇表中检索一个LexemeC*指针。
| 名称 | 描述 |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool |
string | The string of the word to look up. str |
| 返回值 | 词汇表中的词位。const LexemeC* |
Vocab.get_by_orth 方法
从词汇表中获取一个LexemeC*指针。
| 名称 | 描述 |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool |
orth | ID of the verbatim text content. attr_t |
| 返回值 | 词汇表中的词位。const LexemeC* |
StringStore cdef类
一个通过64位哈希值检索字符串的查找表。
属性
| 名称 | 描述 |
|---|---|
mem | A memory pool. Allocated memory will be freed once the StringStore object is garbage collected. cymem.Pool |
keys | A list of hash values in the StringStore. vector[hash_t] |