其他

属性

词符属性

Token 属性在许多地方使用内部ID指定,包括:

所有方法都会自动在ID的字符串版本("DEP")和内部整数符号(DEP)之间进行转换。内部ID可以从spacy.attrs导入,或从StringStore中获取。字符串属性名到内部属性ID的映射存储在spacy.attrs.IDS中。

对应的Token对象属性可以通过相同名称的小写形式访问,例如token.orthtoken.length。 对于表示字符串值的属性,内部整型ID通过Token.attr访问,例如token.dep,而字符串值可以通过添加下划线_来获取,如token.dep_

属性描述
DEPThe token’s dependency label. str
ENT_IDThe token’s entity ID (ent_id). str
ENT_IOBThe IOB part of the token’s entity tag. Uses custom integer values rather than the string store: unset is 0, I is 1, O is 2, and B is 3. str
ENT_KB_IDThe token’s entity knowledge base ID. str
ENT_TYPEThe token’s entity label. str
IS_ALPHAToken text consists of alphabetic characters. bool
IS_ASCIIToken text consists of ASCII characters. bool
IS_DIGITToken text consists of digits. bool
IS_LOWERToken text is in lowercase. bool
IS_PUNCTToken is punctuation. bool
IS_SPACEToken is whitespace. bool
IS_STOPToken is a stop word. bool
IS_TITLEToken text is in titlecase. bool
IS_UPPERToken text is in uppercase. bool
LEMMAThe token’s lemma. str
LENGTHThe length of the token text. int
LIKE_EMAILToken text resembles an email address. bool
LIKE_NUMToken text resembles a number. bool
LIKE_URLToken text resembles a URL. bool
LOWERThe lowercase form of the token text. str
MORPHThe token’s morphological analysis. MorphAnalysis
NORMThe normalized form of the token text. str
ORTHThe exact verbatim text of a token. str
POSThe token’s universal part of speech (UPOS). str
SENT_STARTToken is start of sentence. bool
SHAPEThe token’s shape. str
SPACYToken has a trailing space. bool
TAGThe token’s fine-grained part of speech. str