匹配器

DependencyMatcher

classv3
匹配依存句法分析中的子树

DependencyMatcher 遵循与 MatcherPhraseMatcher 相同的 API,允许您使用 Semgrex 操作符在依存树上进行匹配。它需要一个预训练的 DependencyParser 或其他能设置 Token.depToken.head 属性的组件。具体示例请参阅 使用指南

模式格式

添加到DependencyMatcher的模式由一个字典列表组成, 每个字典描述一个要匹配的词符。除了第一个字典(仅使用RIGHT_IDRIGHT_ATTRS定义锚定词符外),每个模式应包含以下键:

名称描述
LEFT_IDThe name of the left-hand node in the relation, which has been defined in an earlier node. str
REL_OPAn operator that describes how the two nodes are related. str
RIGHT_IDA unique name for the right-hand node in the relation. str
RIGHT_ATTRSThe token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based Matcher. Dict[str, Any]

Operators

以下操作符由DependencyMatcher支持,其中大部分直接来自Semgrex

符号描述
A < BA is the immediate dependent of B.
A > BA is the immediate head of B.
A << BA is the dependent in a chain to B following dep → head paths.
A >> BA is the head in a chain to B following head → dep paths.
A . BA immediately precedes B, i.e. A.i == B.i - 1, and both are within the same dependency tree.
A .* BA precedes B, i.e. A.i < B.i, and both are within the same dependency tree (Semgrex counterpart: ..).
A ; BA immediately follows B, i.e. A.i == B.i + 1, and both are within the same dependency tree (Semgrex counterpart: -).
A ;* BA follows B, i.e. A.i > B.i, and both are within the same dependency tree (Semgrex counterpart: --).
A $+ BB is a right immediate sibling of A, i.e. A and B have the same parent and A.i == B.i - 1.
A $- BB is a left immediate sibling of A, i.e. A and B have the same parent and A.i == B.i + 1.
A $++ BB is a right sibling of A, i.e. A and B have the same parent and A.i < B.i.
A $-- BB is a left sibling of A, i.e. A and B have the same parent and A.i > B.i.
A >+ B v3.5.1B is a right immediate child of A, i.e. A is a parent of B and A.i == B.i - 1 (not in Semgrex).
A >- B v3.5.1B is a left immediate child of A, i.e. A is a parent of B and A.i == B.i + 1 (not in Semgrex).
A >++ BB is a right child of A, i.e. A is a parent of B and A.i < B.i.
A >-- BB is a left child of A, i.e. A is a parent of B and A.i > B.i.
A <+ B v3.5.1B is a right immediate parent of A, i.e. A is a child of B and A.i == B.i - 1 (not in Semgrex).
A <- B v3.5.1B is a left immediate parent of A, i.e. A is a child of B and A.i == B.i + 1 (not in Semgrex).
A <++ BB is a right parent of A, i.e. A is a child of B and A.i < B.i.
A <-- BB is a left parent of A, i.e. A is a child of B and A.i > B.i.

DependencyMatcher.__init__ 方法

创建一个DependencyMatcher

名称描述
vocabThe vocabulary object, which must be shared with the documents the matcher will operate on. Vocab
仅关键字
validateValidate all patterns added to this matcher. bool

DependencyMatcher.__call__ 方法

DocSpan上查找所有匹配提供模式的标记。

名称描述
doclikeThe Doc or Span to match over. Union[Doc,Span]

DependencyMatcher.__len__ 方法

获取添加到依存关系匹配器中的规则数量。请注意,这仅返回规则的数量(与ID数量相同),而不是单个模式的数量。

名称描述

DependencyMatcher.__contains__ 方法

检查匹配器是否包含针对某个匹配ID的规则。

名称描述
keyThe match ID. str

DependencyMatcher.add 方法

向匹配器添加规则,包含一个ID键、一个或多个模式,以及一个可选的用于处理匹配项的回调函数。该回调函数将接收参数matcherdocimatches。如果给定ID的模式已存在,则模式将被扩展。已有的on_match回调将被覆盖。

名称描述
match_idAn ID for the patterns. str
patternsA list of match patterns. A pattern consists of a list of dicts, where each dict describes a token in the tree. List[List[Dict[str, Union[str, Dict]]]]
仅关键字
on_matchCallback function to act on matches. Takes the arguments matcher, doc, i and matches. Optional[Callable[[DependencyMatcher,Doc, int, List[Tuple], Any]]

DependencyMatcher.get 方法

检索存储于某个键的模式。返回规则作为一个(on_match, patterns)元组,其中包含回调函数和可用模式。

名称描述
keyThe ID of the match rule. str

DependencyMatcher.remove 方法

从依赖匹配器中移除一条规则。如果匹配ID不存在,则会引发KeyError错误。

名称描述
keyThe ID of the match rule. str