DependencyMatcher
DependencyMatcher 遵循与 Matcher 和 PhraseMatcher 相同的 API,允许您使用 Semgrex 操作符在依存树上进行匹配。它需要一个预训练的 DependencyParser 或其他能设置 Token.dep 和 Token.head 属性的组件。具体示例请参阅 使用指南。
模式格式
添加到DependencyMatcher的模式由一个字典列表组成,
每个字典描述一个要匹配的词符。除了第一个字典(仅使用RIGHT_ID和
RIGHT_ATTRS定义锚定词符外),每个模式应包含以下键:
| 名称 | 描述 |
|---|---|
LEFT_ID | The name of the left-hand node in the relation, which has been defined in an earlier node. str |
REL_OP | An operator that describes how the two nodes are related. str |
RIGHT_ID | A unique name for the right-hand node in the relation. str |
RIGHT_ATTRS | The token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based Matcher. Dict[str, Any] |
Operators
以下操作符由DependencyMatcher支持,其中大部分直接来自Semgrex:
| 符号 | 描述 |
|---|---|
A < B | A is the immediate dependent of B. |
A > B | A is the immediate head of B. |
A << B | A is the dependent in a chain to B following dep → head paths. |
A >> B | A is the head in a chain to B following head → dep paths. |
A . B | A immediately precedes B, i.e. A.i == B.i - 1, and both are within the same dependency tree. |
A .* B | A precedes B, i.e. A.i < B.i, and both are within the same dependency tree (Semgrex counterpart: ..). |
A ; B | A immediately follows B, i.e. A.i == B.i + 1, and both are within the same dependency tree (Semgrex counterpart: -). |
A ;* B | A follows B, i.e. A.i > B.i, and both are within the same dependency tree (Semgrex counterpart: --). |
A $+ B | B is a right immediate sibling of A, i.e. A and B have the same parent and A.i == B.i - 1. |
A $- B | B is a left immediate sibling of A, i.e. A and B have the same parent and A.i == B.i + 1. |
A $++ B | B is a right sibling of A, i.e. A and B have the same parent and A.i < B.i. |
A $-- B | B is a left sibling of A, i.e. A and B have the same parent and A.i > B.i. |
A >+ B v3.5.1 | B is a right immediate child of A, i.e. A is a parent of B and A.i == B.i - 1 (not in Semgrex). |
A >- B v3.5.1 | B is a left immediate child of A, i.e. A is a parent of B and A.i == B.i + 1 (not in Semgrex). |
A >++ B | B is a right child of A, i.e. A is a parent of B and A.i < B.i. |
A >-- B | B is a left child of A, i.e. A is a parent of B and A.i > B.i. |
A <+ B v3.5.1 | B is a right immediate parent of A, i.e. A is a child of B and A.i == B.i - 1 (not in Semgrex). |
A <- B v3.5.1 | B is a left immediate parent of A, i.e. A is a child of B and A.i == B.i + 1 (not in Semgrex). |
A <++ B | B is a right parent of A, i.e. A is a child of B and A.i < B.i. |
A <-- B | B is a left parent of A, i.e. A is a child of B and A.i > B.i. |
DependencyMatcher.__init__ 方法
创建一个DependencyMatcher。
| 名称 | 描述 |
|---|---|
vocab | The vocabulary object, which must be shared with the documents the matcher will operate on. Vocab |
| 仅关键字 | |
validate | Validate all patterns added to this matcher. bool |
DependencyMatcher.__call__ 方法
在Doc或Span上查找所有匹配提供模式的标记。
| 名称 | 描述 |
|---|---|
doclike | The Doc or Span to match over. Union[Doc,Span] |
| RETURNS | A list of (match_id, token_ids) tuples, describing the matches. The match_id is the ID of the match pattern and token_ids is a list of token indices matched by the pattern, where the position of each token in the list corresponds to the position of the node specification in the pattern. List[Tuple[int, List[int]]] |
DependencyMatcher.__len__ 方法
获取添加到依存关系匹配器中的规则数量。请注意,这仅返回规则的数量(与ID数量相同),而不是单个模式的数量。
| 名称 | 描述 |
|---|---|
| 返回值 | 规则的数量。int |
DependencyMatcher.__contains__ 方法
检查匹配器是否包含针对某个匹配ID的规则。
| 名称 | 描述 |
|---|---|
key | The match ID. str |
| 返回值 | 匹配器是否包含该匹配ID的规则。bool |
DependencyMatcher.add 方法
向匹配器添加规则,包含一个ID键、一个或多个模式,以及一个可选的用于处理匹配项的回调函数。该回调函数将接收参数matcher、doc、i和matches。如果给定ID的模式已存在,则模式将被扩展。已有的on_match回调将被覆盖。
| 名称 | 描述 |
|---|---|
match_id | An ID for the patterns. str |
patterns | A list of match patterns. A pattern consists of a list of dicts, where each dict describes a token in the tree. List[List[Dict[str, Union[str, Dict]]]] |
| 仅关键字 | |
on_match | Callback function to act on matches. Takes the arguments matcher, doc, i and matches. Optional[Callable[[DependencyMatcher,Doc, int, List[Tuple], Any]] |
DependencyMatcher.get 方法
检索存储于某个键的模式。返回规则作为一个(on_match, patterns)元组,其中包含回调函数和可用模式。
| 名称 | 描述 |
|---|---|
key | The ID of the match rule. str |
| RETURNS | The rule, as an (on_match, patterns) tuple. Tuple[Optional[Callable], List[List[Union[Dict, Tuple]]]] |
DependencyMatcher.remove 方法
从依赖匹配器中移除一条规则。如果匹配ID不存在,则会引发KeyError错误。
| 名称 | 描述 |
|---|---|
key | The ID of the match rule. str |