Cython

Cython结构体

允许你将变量分组在一起的C语言对象

TokenC C 结构体

Token对象的Cython数据容器。

名称描述
lexA pointer to the lexeme for the token. const LexemeC*
morphAn ID allowing lookup of morphological attributes. uint64_t
posCoarse-grained part-of-speech tag. univ_pos_t
spacyA binary value indicating whether the token has trailing whitespace. bint
tagFine-grained part-of-speech tag. attr_t (uint64_t)
idxThe character offset of the token within the parent document. int
lemmaBase form of the token, with no inflectional suffixes. attr_t (uint64_t)
senseSpace for storing a word sense ID, currently unused. attr_t (uint64_t)
headOffset of the syntactic parent relative to the token. int
depSyntactic dependency relation. attr_t (uint64_t)
l_kidsNumber of left children. uint32_t
r_kidsNumber of right children. uint32_t
l_edgeOffset of the leftmost token of this token’s syntactic descendants. uint32_t
r_edgeOffset of the rightmost token of this token’s syntactic descendants. uint32_t
sent_startTernary value indicating whether the token is the first word of a sentence. 0 indicates a missing value, -1 indicates False and 1 indicates True. The default value, 0, is interpreted as no sentence break. Sentence boundary detectors will usually set 0 for all tokens except tokens that follow a sentence boundary. int
ent_iobIOB code of named entity tag. 0 indicates a missing value, 1 indicates I, 2 indicates 0 and 3 indicates B. int
ent_typeNamed entity type. attr_t (uint64_t)
ent_idID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. attr_t (uint64_t)

Token.get_struct_attr staticmethodnogil

通过属性ID从TokenC结构中获取属性值。

名称描述
tokenA pointer to a TokenC struct. const TokenC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t

Token.set_struct_attr staticmethodnogil

通过属性ID设置TokenC结构体的属性值。

名称描述
tokenA pointer to a TokenC struct. const TokenC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t
valueThe value to set. attr_t (uint64_t)

token_by_start 函数

通过其首字符的偏移量在TokenC*数组中查找一个标记。

名称描述
tokensA TokenC* array. const TokenC*
lengthThe number of tokens in the array. int
start_charThe start index to search for. int

token_by_end 函数

通过其末尾字符的偏移量在TokenC*数组中查找一个标记。

名称描述
tokensA TokenC* array. const TokenC*
lengthThe number of tokens in the array. int
end_charThe end index to search for. int

set_children_from_heads 函数

设置允许在TokenC*数组上查找语法子节点的属性。 在修改TokenC.head属性后必须调用此函数,以确保解析树导航的一致性。

名称描述
tokensA TokenC* array. const TokenC*
lengthThe number of tokens in the array. int

LexemeC C 结构体

存储词汇类型信息的结构体。LexemeC结构体通常由Vocab拥有,并通过TokenC结构体上的只读指针访问。

名称描述
flagsBit-field for binary lexical flag values. flags_t (uint64_t)
idUsually used to map lexemes to rows in a matrix, e.g. for word vectors. Does not need to be unique, so currently misnamed. attr_t (uint64_t)
lengthNumber of unicode characters in the lexeme. attr_t (uint64_t)
orthID of the verbatim text content. attr_t (uint64_t)
lowerID of the lowercase form of the lexeme. attr_t (uint64_t)
normID of the lexeme’s norm, i.e. a normalized form of the text. attr_t (uint64_t)
shapeTransform of the lexeme’s string, to show orthographic features. attr_t (uint64_t)
prefixLength-N substring from the start of the lexeme. Defaults to N=1. attr_t (uint64_t)
suffixLength-N substring from the end of the lexeme. Defaults to N=3. attr_t (uint64_t)

Lexeme.get_struct_attr staticmethodnogil

通过属性ID从LexemeC结构中获取属性值。

名称描述
lexA pointer to a LexemeC struct. const LexemeC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t

Lexeme.set_struct_attr staticmethodnogil

通过属性ID设置LexemeC结构体的属性值。

名称描述
lexA pointer to a LexemeC struct. const LexemeC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t
valueThe value to set. attr_t (uint64_t)

Lexeme.c_check_flag staticmethodnogil

检查二进制标志属性的值。

名称描述
lexemeA pointer to a LexemeC struct. const LexemeC*
flag_idThe ID of the flag to look up. The flag IDs are enumerated in spacy.typedefs. attr_id_t

Lexeme.c_set_flag staticmethodnogil

设置一个二进制标志属性的值。

名称描述
lexemeA pointer to a LexemeC struct. const LexemeC*
flag_idThe ID of the flag to look up. The flag IDs are enumerated in spacy.typedefs. attr_id_t
valueThe value to set. bint