Cython结构体
TokenC C 结构体
Token对象的Cython数据容器。
| 名称 | 描述 |
|---|---|
lex | A pointer to the lexeme for the token. const LexemeC* |
morph | An ID allowing lookup of morphological attributes. uint64_t |
pos | Coarse-grained part-of-speech tag. univ_pos_t |
spacy | A binary value indicating whether the token has trailing whitespace. bint |
tag | Fine-grained part-of-speech tag. attr_t |
idx | The character offset of the token within the parent document. int |
lemma | Base form of the token, with no inflectional suffixes. attr_t |
sense | Space for storing a word sense ID, currently unused. attr_t |
head | Offset of the syntactic parent relative to the token. int |
dep | Syntactic dependency relation. attr_t |
l_kids | Number of left children. uint32_t |
r_kids | Number of right children. uint32_t |
l_edge | Offset of the leftmost token of this token’s syntactic descendants. uint32_t |
r_edge | Offset of the rightmost token of this token’s syntactic descendants. uint32_t |
sent_start | Ternary value indicating whether the token is the first word of a sentence. 0 indicates a missing value, -1 indicates False and 1 indicates True. The default value, 0, is interpreted as no sentence break. Sentence boundary detectors will usually set 0 for all tokens except tokens that follow a sentence boundary. int |
ent_iob | IOB code of named entity tag. 0 indicates a missing value, 1 indicates I, 2 indicates 0 and 3 indicates B. int |
ent_type | Named entity type. attr_t |
ent_id | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. attr_t |
Token.get_struct_attr staticmethodnogil
通过属性ID从TokenC结构中获取属性值。
| 名称 | 描述 |
|---|---|
token | A pointer to a TokenC struct. const TokenC* |
feat_name | The ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t |
| 返回值 | 属性的值。attr_t |
Token.set_struct_attr staticmethodnogil
通过属性ID设置TokenC结构体的属性值。
| 名称 | 描述 |
|---|---|
token | A pointer to a TokenC struct. const TokenC* |
feat_name | The ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t |
value | The value to set. attr_t |
token_by_start 函数
通过其首字符的偏移量在TokenC*数组中查找一个标记。
| 名称 | 描述 |
|---|---|
tokens | A TokenC* array. const TokenC* |
length | The number of tokens in the array. int |
start_char | The start index to search for. int |
| RETURNS | The index of the token in the array or -1 if not found. int |
token_by_end 函数
通过其末尾字符的偏移量在TokenC*数组中查找一个标记。
| 名称 | 描述 |
|---|---|
tokens | A TokenC* array. const TokenC* |
length | The number of tokens in the array. int |
end_char | The end index to search for. int |
| RETURNS | The index of the token in the array or -1 if not found. int |
set_children_from_heads 函数
设置允许在TokenC*数组上查找语法子节点的属性。
在修改TokenC.head属性后必须调用此函数,以确保解析树导航的一致性。
| 名称 | 描述 |
|---|---|
tokens | A TokenC* array. const TokenC* |
length | The number of tokens in the array. int |
LexemeC C 结构体
存储词汇类型信息的结构体。LexemeC结构体通常由Vocab拥有,并通过TokenC结构体上的只读指针访问。
| 名称 | 描述 |
|---|---|
flags | Bit-field for binary lexical flag values. flags_t |
id | Usually used to map lexemes to rows in a matrix, e.g. for word vectors. Does not need to be unique, so currently misnamed. attr_t |
length | Number of unicode characters in the lexeme. attr_t |
orth | ID of the verbatim text content. attr_t |
lower | ID of the lowercase form of the lexeme. attr_t |
norm | ID of the lexeme’s norm, i.e. a normalized form of the text. attr_t |
shape | Transform of the lexeme’s string, to show orthographic features. attr_t |
prefix | Length-N substring from the start of the lexeme. Defaults to N=1. attr_t |
suffix | Length-N substring from the end of the lexeme. Defaults to N=3. attr_t |
Lexeme.get_struct_attr staticmethodnogil
通过属性ID从LexemeC结构中获取属性值。
| 名称 | 描述 |
|---|---|
lex | A pointer to a LexemeC struct. const LexemeC* |
feat_name | The ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t |
| 返回值 | 属性的值。attr_t |
Lexeme.set_struct_attr staticmethodnogil
通过属性ID设置LexemeC结构体的属性值。
| 名称 | 描述 |
|---|---|
lex | A pointer to a LexemeC struct. const LexemeC* |
feat_name | The ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t |
value | The value to set. attr_t |
Lexeme.c_check_flag staticmethodnogil
检查二进制标志属性的值。
| 名称 | 描述 |
|---|---|
lexeme | A pointer to a LexemeC struct. const LexemeC* |
flag_id | The ID of the flag to look up. The flag IDs are enumerated in spacy.typedefs. attr_id_t |
| 返回值 | 标志的布尔值。bint |
Lexeme.c_set_flag staticmethodnogil
设置一个二进制标志属性的值。
| 名称 | 描述 |
|---|---|
lexeme | A pointer to a LexemeC struct. const LexemeC* |
flag_id | The ID of the flag to look up. The flag IDs are enumerated in spacy.typedefs. attr_id_t |
value | The value to set. bint |