词素

class

词汇表中的一个条目

一个Lexeme没有字符串上下文——它是一个词类型，与词标记相对。因此它没有词性标签、依存句法分析或词元（如果词形还原依赖于词性标签）。

Lexeme.init 方法

创建一个Lexeme对象。

名称	描述
`vocab`	The parent vocabulary. Vocab
`orth`	The orth id of the lexeme. int

Lexeme.set_flag 方法

更改布尔标志的值。

名称	描述
`flag_id`	The attribute ID of the flag to set. int
`value`	The new value of the flag. bool

Lexeme.check_flag 方法

检查布尔标志的值。

名称	描述
`flag_id`	The attribute ID of the flag to query. int
返回值	标志的值。bool

Lexeme.similarity 方法需要模型

计算语义相似度估计值。默认使用向量余弦相似度。

名称	描述
other	The object to compare with. By default, accepts `Doc`, `Span`, `Token` and `Lexeme` objects. Union[Doc,Span,Token,Lexeme]
返回值	一个标量相似度分数。数值越高表示越相似。float

Lexeme.has_vector 属性需要模型

一个布尔值，表示该词素是否关联有词向量。

名称	描述
返回值	判断该词素是否附加了向量数据。bool

Lexeme.vector 属性需要模型

一个实值意义表示。

名称	描述
返回值	一个一维数组，表示词素的向量。numpy.ndarray[ndim=1, dtype=float32]

Lexeme.vector_norm 属性需要模型

词元向量表示的L2范数。

名称	描述
返回值	向量表示的L2范数。float

属性

名称	描述
`vocab`	The lexeme’s vocabulary. Vocab
`text`	Verbatim text content. str
`orth`	ID of the verbatim text content. int
`orth_`	Verbatim text content (identical to `Lexeme.text`). Exists mostly for consistency with the other attributes. str
`rank`	Sequential ID of the lexeme’s lexical type, used to index into tables, e.g. for word vectors. int
`flags`	Container of the lexeme’s binary flags. int
`norm`	The lexeme’s norm, i.e. a normalized form of the lexeme text. int
`norm_`	The lexeme’s norm, i.e. a normalized form of the lexeme text. str
`lower`	Lowercase form of the word. int
`lower_`	Lowercase form of the word. str
`shape`	Transform of the word’s string, to show orthographic features. Alphabetic characters are replaced by `x` or `X`, and numeric characters are replaced by `d`, and sequences of the same character are truncated after length 4. For example,`"Xxxx"`or`"dd"`. int
`shape_`	Transform of the word’s string, to show orthographic features. Alphabetic characters are replaced by `x` or `X`, and numeric characters are replaced by `d`, and sequences of the same character are truncated after length 4. For example,`"Xxxx"`or`"dd"`. str
`prefix`	Length-N substring from the start of the word. Defaults to `N=1`. int
`prefix_`	Length-N substring from the start of the word. Defaults to `N=1`. str
`suffix`	Length-N substring from the end of the word. Defaults to `N=3`. int
`suffix_`	Length-N substring from the end of the word. Defaults to `N=3`. str
`is_alpha`	Does the lexeme consist of alphabetic characters? Equivalent to `lexeme.text.isalpha()`. bool
`is_ascii`	Does the lexeme consist of ASCII characters? Equivalent to `[any(ord(c) >= 128 for c in lexeme.text)]`. bool
`is_digit`	Does the lexeme consist of digits? Equivalent to `lexeme.text.isdigit()`. bool
`is_lower`	Is the lexeme in lowercase? Equivalent to `lexeme.text.islower()`. bool
`is_upper`	Is the lexeme in uppercase? Equivalent to `lexeme.text.isupper()`. bool
`is_title`	Is the lexeme in titlecase? Equivalent to `lexeme.text.istitle()`. bool
`is_punct`	Is the lexeme punctuation? bool
`is_left_punct`	Is the lexeme a left punctuation mark, e.g. `(`? bool
`is_right_punct`	Is the lexeme a right punctuation mark, e.g. `)`? bool
`is_space`	Does the lexeme consist of whitespace characters? Equivalent to `lexeme.text.isspace()`. bool
`is_bracket`	Is the lexeme a bracket? bool
`is_quote`	Is the lexeme a quotation mark? bool
`is_currency`	Is the lexeme a currency symbol? bool
`like_url`	Does the lexeme resemble a URL? bool
`like_num`	Does the lexeme represent a number? e.g. “10.9”, “10”, “ten”, etc. bool
`like_email`	Does the lexeme resemble an email address? bool
`is_oov`	Is the lexeme out-of-vocabulary (i.e. does it not have a word vector)? bool
`is_stop`	Is the lexeme part of a “stop list”? bool
`lang`	Language of the parent vocabulary. int
`lang_`	Language of the parent vocabulary. str
`prob`	Smoothed log probability estimate of the lexeme’s word type (context-independent entry in the vocabulary). float
`cluster`	Brown cluster ID. int
`sentiment`	A scalar value indicating the positivity or negativity of the lexeme. float

建议编辑

容器

Lexeme.__init__ 方法

Lexeme.set_flag 方法

Lexeme.check_flag 方法

Lexeme.similarity 方法需要模型

Lexeme.has_vector 属性需要模型

Lexeme.vector 属性需要模型

Lexeme.vector_norm 属性需要模型

属性

Lexeme.init 方法