词素
一个Lexeme没有字符串上下文——它是一个词类型,与词标记相对。因此它没有词性标签、依存句法分析或词元(如果词形还原依赖于词性标签)。
Lexeme.__init__ 方法
创建一个Lexeme对象。
| 名称 | 描述 |
|---|---|
vocab | The parent vocabulary. Vocab |
orth | The orth id of the lexeme. int |
Lexeme.set_flag 方法
更改布尔标志的值。
| 名称 | 描述 |
|---|---|
flag_id | The attribute ID of the flag to set. int |
value | The new value of the flag. bool |
Lexeme.check_flag 方法
检查布尔标志的值。
| 名称 | 描述 |
|---|---|
flag_id | The attribute ID of the flag to query. int |
| 返回值 | 标志的值。bool |
Lexeme.similarity 方法需要模型
计算语义相似度估计值。默认使用向量余弦相似度。
| 名称 | 描述 |
|---|---|
| other | The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc,Span,Token,Lexeme] |
| 返回值 | 一个标量相似度分数。数值越高表示越相似。float |
Lexeme.has_vector 属性需要模型
一个布尔值,表示该词素是否关联有词向量。
| 名称 | 描述 |
|---|---|
| 返回值 | 判断该词素是否附加了向量数据。bool |
Lexeme.vector 属性需要模型
一个实值意义表示。
| 名称 | 描述 |
|---|---|
| 返回值 | 一个一维数组,表示词素的向量。numpy.ndarray[ndim=1, dtype=float32] |
Lexeme.vector_norm 属性需要模型
词元向量表示的L2范数。
| 名称 | 描述 |
|---|---|
| 返回值 | 向量表示的L2范数。float |
属性
| 名称 | 描述 |
|---|---|
vocab | The lexeme’s vocabulary. Vocab |
text | Verbatim text content. str |
orth | ID of the verbatim text content. int |
orth_ | Verbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes. str |
rank | Sequential ID of the lexeme’s lexical type, used to index into tables, e.g. for word vectors. int |
flags | Container of the lexeme’s binary flags. int |
norm | The lexeme’s norm, i.e. a normalized form of the lexeme text. int |
norm_ | The lexeme’s norm, i.e. a normalized form of the lexeme text. str |
lower | Lowercase form of the word. int |
lower_ | Lowercase form of the word. str |
shape | Transform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". int |
shape_ | Transform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". str |
prefix | Length-N substring from the start of the word. Defaults to N=1. int |
prefix_ | Length-N substring from the start of the word. Defaults to N=1. str |
suffix | Length-N substring from the end of the word. Defaults to N=3. int |
suffix_ | Length-N substring from the end of the word. Defaults to N=3. str |
is_alpha | Does the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha(). bool |
is_ascii | Does the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)]. bool |
is_digit | Does the lexeme consist of digits? Equivalent to lexeme.text.isdigit(). bool |
is_lower | Is the lexeme in lowercase? Equivalent to lexeme.text.islower(). bool |
is_upper | Is the lexeme in uppercase? Equivalent to lexeme.text.isupper(). bool |
is_title | Is the lexeme in titlecase? Equivalent to lexeme.text.istitle(). bool |
is_punct | Is the lexeme punctuation? bool |
is_left_punct | Is the lexeme a left punctuation mark, e.g. (? bool |
is_right_punct | Is the lexeme a right punctuation mark, e.g. )? bool |
is_space | Does the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace(). bool |
is_bracket | Is the lexeme a bracket? bool |
is_quote | Is the lexeme a quotation mark? bool |
is_currency | Is the lexeme a currency symbol? bool |
like_url | Does the lexeme resemble a URL? bool |
like_num | Does the lexeme represent a number? e.g. “10.9”, “10”, “ten”, etc. bool |
like_email | Does the lexeme resemble an email address? bool |
is_oov | Is the lexeme out-of-vocabulary (i.e. does it not have a word vector)? bool |
is_stop | Is the lexeme part of a “stop list”? bool |
lang | Language of the parent vocabulary. int |
lang_ | Language of the parent vocabulary. str |
prob | Smoothed log probability estimate of the lexeme’s word type (context-independent entry in the vocabulary). float |
cluster | Brown cluster ID. int |
sentiment | A scalar value indicating the positivity or negativity of the lexeme. float |