属性

词符属性

Token 属性在许多地方使用内部ID指定，包括：

所有方法都会自动在ID的字符串版本("DEP")和内部整数符号(DEP)之间进行转换。内部ID可以从spacy.attrs导入，或从StringStore中获取。字符串属性名到内部属性ID的映射存储在spacy.attrs.IDS中。

对应的Token对象属性可以通过相同名称的小写形式访问，例如token.orth或token.length。对于表示字符串值的属性，内部整型ID通过Token.attr访问，例如token.dep，而字符串值可以通过添加下划线_来获取，如token.dep_。

属性	描述
`DEP`	The token’s dependency label. str
`ENT_ID`	The token’s entity ID (`ent_id`). str
`ENT_IOB`	The IOB part of the token’s entity tag. Uses custom integer values rather than the string store: unset is `0`, `I` is `1`, `O` is `2`, and `B` is `3`. str
`ENT_KB_ID`	The token’s entity knowledge base ID. str
`ENT_TYPE`	The token’s entity label. str
`IS_ALPHA`	Token text consists of alphabetic characters. bool
`IS_ASCII`	Token text consists of ASCII characters. bool
`IS_DIGIT`	Token text consists of digits. bool
`IS_LOWER`	Token text is in lowercase. bool
`IS_PUNCT`	Token is punctuation. bool
`IS_SPACE`	Token is whitespace. bool
`IS_STOP`	Token is a stop word. bool
`IS_TITLE`	Token text is in titlecase. bool
`IS_UPPER`	Token text is in uppercase. bool
`LEMMA`	The token’s lemma. str
`LENGTH`	The length of the token text. int
`LIKE_EMAIL`	Token text resembles an email address. bool
`LIKE_NUM`	Token text resembles a number. bool
`LIKE_URL`	Token text resembles a URL. bool
`LOWER`	The lowercase form of the token text. str
`MORPH`	The token’s morphological analysis. MorphAnalysis
`NORM`	The normalized form of the token text. str
`ORTH`	The exact verbatim text of a token. str
`POS`	The token’s universal part of speech (UPOS). str
`SENT_START`	Token is start of sentence. bool
`SHAPE`	The token’s shape. str
`SPACY`	Token has a trailing space. bool
`TAG`	The token’s fine-grained part of speech. str

其他