容器

词素

class
词汇表中的一个条目

一个Lexeme没有字符串上下文——它是一个词类型,与词标记相对。因此它没有词性标签、依存句法分析或词元(如果词形还原依赖于词性标签)。

Lexeme.__init__ 方法

创建一个Lexeme对象。

名称描述
vocabThe parent vocabulary. Vocab
orthThe orth id of the lexeme. int

Lexeme.set_flag 方法

更改布尔标志的值。

名称描述
flag_idThe attribute ID of the flag to set. int
valueThe new value of the flag. bool

Lexeme.check_flag 方法

检查布尔标志的值。

名称描述
flag_idThe attribute ID of the flag to query. int

Lexeme.similarity 方法需要模型

计算语义相似度估计值。默认使用向量余弦相似度。

名称描述
otherThe object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc,Span,Token,Lexeme]

Lexeme.has_vector 属性需要模型

一个布尔值,表示该词素是否关联有词向量。

名称描述

Lexeme.vector 属性需要模型

一个实值意义表示。

名称描述

Lexeme.vector_norm 属性需要模型

词元向量表示的L2范数。

名称描述

属性

名称描述
vocabThe lexeme’s vocabulary. Vocab
textVerbatim text content. str
orthID of the verbatim text content. int
orth_Verbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes. str
rankSequential ID of the lexeme’s lexical type, used to index into tables, e.g. for word vectors. int
flagsContainer of the lexeme’s binary flags. int
normThe lexeme’s norm, i.e. a normalized form of the lexeme text. int
norm_The lexeme’s norm, i.e. a normalized form of the lexeme text. str
lowerLowercase form of the word. int
lower_Lowercase form of the word. str
shapeTransform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". int
shape_Transform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". str
prefixLength-N substring from the start of the word. Defaults to N=1. int
prefix_Length-N substring from the start of the word. Defaults to N=1. str
suffixLength-N substring from the end of the word. Defaults to N=3. int
suffix_Length-N substring from the end of the word. Defaults to N=3. str
is_alphaDoes the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha(). bool
is_asciiDoes the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)]. bool
is_digitDoes the lexeme consist of digits? Equivalent to lexeme.text.isdigit(). bool
is_lowerIs the lexeme in lowercase? Equivalent to lexeme.text.islower(). bool
is_upperIs the lexeme in uppercase? Equivalent to lexeme.text.isupper(). bool
is_titleIs the lexeme in titlecase? Equivalent to lexeme.text.istitle(). bool
is_punctIs the lexeme punctuation? bool
is_left_punctIs the lexeme a left punctuation mark, e.g. (? bool
is_right_punctIs the lexeme a right punctuation mark, e.g. )? bool
is_spaceDoes the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace(). bool
is_bracketIs the lexeme a bracket? bool
is_quoteIs the lexeme a quotation mark? bool
is_currencyIs the lexeme a currency symbol? bool
like_urlDoes the lexeme resemble a URL? bool
like_numDoes the lexeme represent a number? e.g. “10.9”, “10”, “ten”, etc. bool
like_emailDoes the lexeme resemble an email address? bool
is_oovIs the lexeme out-of-vocabulary (i.e. does it not have a word vector)? bool
is_stopIs the lexeme part of a “stop list”? bool
langLanguage of the parent vocabulary. int
lang_Language of the parent vocabulary. str
probSmoothed log probability estimate of the lexeme’s word type (context-independent entry in the vocabulary). float
clusterBrown cluster ID. int
sentimentA scalar value indicating the positivity or negativity of the lexeme. float