Compute API 参考

计算API参考#

ComputeMixin 模块#

class graphistry.compute.ComputeMixin.ComputeMixin(*args, **kwargs)#

基础类：object

chain(*args, **kwargs)#

链接一系列ASTObject（节点/边）遍历操作

根据节点和边匹配器列表返回匹配的子图如果任何匹配器被命名，则在输出中添加相应命名的布尔值列

对于直接调用，暴露了方便的List[ASTObject]。内部操作应优先使用Chain。

使用 engine=’cudf’ 强制启用自动GPU加速模式

Parameters:: ops – List[ASTObject] 各种节点和边的匹配器
Returns:: 绘图仪
Return type:: 绘图仪

示例：查找某种类型的节点

from graphistry.ast import n

people_nodes_df = g.chain([ n({"type": "person"}) ])._nodes

示例：查找具有某些属性的2跳边序列

from graphistry.ast import e_forward

g_2_hops = g.chain([ e_forward({"interesting": True}, hops=2) ])
g_2_hops.plot()

示例：查找距离另一个节点1-2跳的任何节点，并为每一跳标记

from graphistry.ast import n, e_undirected

g_2_hops = g.chain([ n({g._node: "a"}), e_undirected(name="hop1"), e_undirected(name="hop2") ])
print('# first-hop edges:', len(g_2_hops._edges[ g_2_hops._edges.hop1 == True ]))

示例：两种风险节点之间的交易节点

from graphistry.ast import n, e_forward, e_reverse

g_risky = g.chain([
    n({"risk1": True}),
    e_forward(to_fixed=True),
    n({"type": "transaction"}, name="hit"),
    e_reverse(to_fixed=True),
    n({"risk2": True})
])
print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ]))

示例：使用 is_in 在每一步按多个节点类型进行过滤

from graphistry.ast import n, e_forward, e_reverse, is_in

g_risky = g.chain([
    n({"type": is_in(["person", "company"])}),
    e_forward({"e_type": is_in(["owns", "reviews"])}, to_fixed=True),
    n({"type": is_in(["transaction", "account"])}, name="hit"),
    e_reverse(to_fixed=True),
    n({"risk2": True})
])
print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ]))

示例：使用自动GPU加速运行

import cudf
import graphistry

e_gdf = cudf.from_pandas(df)
g1 = graphistry.edges(e_gdf, 's', 'd')
g2 = g1.chain([ ... ])

示例：使用自动GPU加速运行，并强制GPU模式

import cudf
import graphistry

e_gdf = cudf.from_pandas(df)
g1 = graphistry.edges(e_gdf, 's', 'd')
g2 = g1.chain([ ... ], engine='cudf')

chain_remote(*args, **kwargs)#

在远程数据集上远程运行GFQL链查询。

使用最新绑定的_dataset_id，如果尚未绑定，则上传当前数据集。请注意，重新绑定edges()和nodes()的调用会重置_dataset_id的绑定。

Parameters:

chain (Union[Chain, List[ASTObject], Dict[str, JSONVal]]) – GFQL 链式查询作为 Python 对象或序列化的 JSON 格式
api_token (可选[str]) – 可选的JWT令牌。如果未提供，则刷新JWT并使用它。
dataset_id (可选[str]) – 可选的dataset_id。如果未提供，将回退到self._dataset_id。如果未提供，将上传当前数据，存储该dataset_id，并对其运行GFQL。
output_type (OutputType) – 是否返回节点和边（“all”，默认值），仅返回节点的可绘图（“nodes”），或仅返回边的可绘图（“edges”）。如果只需要结果图形形状的数据框（output_type=”shape”），请改用 chain_remote_shape()。
format (可选[FormatType]) – 获取结果的格式。我们推荐使用列式格式，如parquet，当output_type不是shape时，它默认为此格式。
df_export_args (可选[Dict, str, Any]]) – 当服务器解析数据时，传递的任何额外参数。
node_col_subset (可选[列表[字符串]]) – 当服务器返回节点时，返回哪些属性子集。默认为全部。
edge_col_subset (可选[列表[字符串]]) – 当服务器返回边时，返回哪些属性子集。默认为全部。
engine (可选[字面量["pandas", "cudf]]) – 覆盖GFQL使用的运行模式。默认情况下，根据图的大小来决定。
validate (bool) – 是否在本地测试代码，以及如果上传数据，则测试数据。默认为 true。

Return type:

Plottable

Example: Explicitly upload graph and return subgraph where nodes have at least one edge

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst').upload()
assert g1._dataset_id, "Graph should have uploaded"

g2 = g1.chain_remote([n(), e(), n()])
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')

Example: Return subgraph where nodes have at least one edge, with implicit upload

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')
g2 = g1.chain_remote([n(), e(), n()])
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')

Example: Return subgraph where nodes have at least one edge, with implicit upload, and force GPU mode

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')
g2 = g1.chain_remote([n(), e(), n()], engine='cudf')
print(f'dataset id: {g2._dataset_id}, # nodes: {len(g2._nodes)}')

chain_remote_shape(*args, **kwargs)#

与chain_remote()类似，不同之处在于它不返回Plottable，而是返回一个形状与结果图相同的pd.DataFrame。

作为一个快速的成功指示器非常有用，当匹配找到命中时，避免了返回完整图的需要，只需返回元数据。

Example: Upload graph and compute number of nodes with at least one edge

import graphistry
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst').upload()
assert g1._dataset_id, "Graph should have uploaded"

shape_df = g1.chain_remote_shape([n(), e(), n()])
print(shape_df)

Example: Compute number of nodes with at least one edge, with implicit upload, and force GPU mode

import graphistry
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry.edges(es, 'src', 'dst')

shape_df = g1.chain_remote_shape([n(), e(), n()], engine='cudf')
print(shape_df)

Return type:: 数据框

collapse(node, attribute, column, self_edges=False, unwrap=False, verbose=False)#

根据给定的列属性从节点开始进行拓扑感知折叠

从起始节点node遍历有向图，并折叠共享相同属性的节点集群，以保持拓扑结构。

Parameters:

node (str | int) – 开始遍历的起始节点
attribute (str | int) – 给定的 attribute 在 column 内进行折叠
column (str | int) – 包含要折叠的属性的节点DataFrame的列
self_edges (bool) – 是否在折叠图中包含自边
unwrap (bool) – 是否将折叠的图展开为单个节点
verbose (bool) – 是否打印出折叠摘要信息

:返回: 一个新的Graphistry实例，包含由列属性给出的折叠节点和边的节点和边DataFrame – 节点和边DataFrame包含六个新列collapse_{node | edges}和final_{node | edges}，而原始的（节点，源，目标）列保持不变 :rtype: Plottable

drop_nodes(nodes)#: 返回移除了涉及节点ID系列的任何节点/边的g

filter_edges_by_dict(*args, **kwargs)#: 过滤边以匹配filter_dict中的所有值

filter_nodes_by_dict(*args, **kwargs)#: 过滤节点以匹配filter_dict中的所有值

get_degrees(col='degree', degree_in='degree_in', degree_out='degree_out')#

用度信息装饰节点表

边必须类似于数据框：pandas, cudf, …

参数决定生成的列名

警告：目前自循环被重复计算。这可能会改变。

示例：生成度数列

edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
g = graphistry.edges(edges, 's', 'd')
print(g._nodes)  # None
g2 = g.get_degrees()
print(g2._nodes)  # pd.DataFrame with 'id', 'degree', 'degree_in', 'degree_out'

Parameters:

col (str)
degree_in (str)
degree_out (str)

get_indegrees(col='degree_in')#

参见 get_degrees

Parameters:: col (str)

get_outdegrees(col='degree_out')#

参见 get_degrees

Parameters:: col (str)

get_topological_levels(level_col='level', allow_cycles=True, warn_cycles=True, remove_self_loops=True)#

基于拓扑排序深度在列level_col上标记节点支持pandas + cudf，在每个级别计算中使用并行性选项： * allow_cycles: 如果为False并检测到循环，则抛出ValueException，否则通过选择最低入度节点来打破循环 * warn_cycles: 如果为True并检测到循环，则继续并发出警告 * remove_self_loops: 通过移除自循环进行预处理。避免allow_cycles=False, warn_cycles=True的消息。

示例：

edges_df = gpd.DataFrame({‘s’: [‘a’, ‘b’, ‘c’, ‘d’],’d’: [‘b’, ‘c’, ‘e’, ‘e’]}) g = graphistry.edges(edges_df, ‘s’, ‘d’) g2 = g.get_topological_levels() g2._nodes.info() # pd.DataFrame 包含 | ‘id’ , ‘level’ |

Parameters:

level_col (str)
allow_cycles (bool)
warn_cycles (bool)
remove_self_loops (bool)

Return type:

Plottable

hop(*args, **kwargs)#

给定一个图和一些源节点，返回从源节点出发在k跳内的所有路径的子图

这可能比等效的chain([…])调用更快，后者会通过额外的步骤来包装它

请参阅chain()示例以了解许多参数的示例

g: 绘图器 nodes: 带有与g._node匹配的id列的数据框。None表示所有节点（默认）。 hops: 考虑长度为1到'hops'步的路径，如果有的话（默认1）。 to_fixed_point: 继续跳跃直到找不到新节点（忽略hops） direction: 'forward', 'reverse', 'undirected' edge_match: 用于精确匹配的kv对字典（参见：filter_edges_by_dict） source_node_match: 用于在跳跃前匹配节点的kv对字典（包括中间节点） destination_node_match: 用于在跳跃后匹配节点的kv对字典（包括中间节点） source_node_query: 用于在跳跃前匹配节点的数据框查询（包括中间节点） destination_node_query: 用于在跳跃后匹配节点的数据框查询（包括中间节点） edge_query: 用于在跳跃前匹配边的数据框查询（包括中间节点） return_as_wave_front: 在返回中排除起始节点，仅返回遇到的节点 target_wave_front: 仅考虑这些节点 + self._nodes用于可达性 engine: 'auto', 'pandas', 'cudf' (GPU)

keep_nodes(nodes)#: 限制节点和边为通过参数节点选择的那些对于边，源和目标都必须在节点中节点可以是节点ID的列表或系列，或者是一个字典当是字典时，每个键对应一个节点列，当所有匹配时，节点将被包含

materialize_nodes(reuse=True, engine=EngineAbstract.AUTO)#

基于g._edges生成g._nodes

如果存在，使用 g._node 作为节点 ID，否则使用 'id'

边必须类似于数据框：cudf、pandas、…

当 reuse=True 且 g._nodes 不为 None 时，使用它

示例：生成节点

edges = pd.DataFrame({'s': ['a','b','c','d'], 'd': ['c','c','e','e']})
g = graphistry.edges(edges, 's', 'd')
print(g._nodes)  # None
g2 = g.materialize_nodes()
print(g2._nodes)  # pd.DataFrame

Parameters:

reuse (bool)
engine (EngineAbstract | str)

Return type:

Plottable

prune_self_edges()#

python_remote_g(*args, **kwargs)#

在远程数据集上远程运行Python代码，返回一个可绘制的对象

使用最新绑定的_dataset_id，如果尚未绑定，则上传当前数据集。请注意，重新绑定edges()和nodes()的调用会重置_dataset_id的绑定。

Parameters:

code (Union[str, Callable[..., object]]) – 包含顶层函数 def task(g: Plottable) -> Union[str, Dict] 的 Python 代码。
api_token (可选[str]) – 可选的JWT令牌。如果未提供，则刷新JWT并使用它。
dataset_id (可选[str]) – 可选的dataset_id。如果未提供，将回退到self._dataset_id。如果未定义，将上传当前数据，存储该dataset_id，并针对该数据运行代码。
format (可选[FormatType]) – 获取结果的格式。默认为‘parquet’。
output_type (可选[OutputTypeGraph]) – 要获取的输出形状。默认为‘all’。选项包括‘nodes’、‘edges’、‘all’（两者）。对于其他变体，请参见python_remote_shape和python_remote_json。
engine (Literal["pandas", "cudf]) – 覆盖GFQL使用的运行模式。默认为“cudf”。
run_label (可选[str]) – 用于服务器端作业跟踪的运行的可选标签。
validate (bool) – 是否在本地测试代码，以及如果上传数据，则测试数据。默认为 true。

Return type:

任何

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
g2 = g1.python_remote_g(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g
    ''',
    engine='cudf')
num_edges = len(g2._edges)
print(f'num_edges: {num_edges}')

python_remote_json(*args, **kwargs)#

在远程数据集上远程运行返回json的Python代码

使用最新绑定的_dataset_id，如果尚未绑定，则上传当前数据集。请注意，重新绑定edges()和nodes()的调用会重置_dataset_id的绑定。

Parameters:

code (Union[str, Callable[..., object]]) – 包含顶层函数 def task(g: Plottable) -> Union[str, Dict] 的 Python 代码。
api_token (可选[str]) – 可选的JWT令牌。如果未提供，则刷新JWT并使用它。
dataset_id (可选[str]) – 可选的dataset_id。如果未提供，将回退到self._dataset_id。如果未定义，将上传当前数据，存储该dataset_id，并针对该dataset_id运行代码。
engine (Literal["pandas", "cudf]) – 覆盖GFQL使用的运行模式。默认为“cudf”。
run_label (可选[str]) – 用于服务器端作业跟踪的运行的可选标签。
validate (bool) – 是否在本地测试代码，以及如果上传数据，则测试数据。默认为 true。

Return type:

任何

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
obj = g1.python_remote_json(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return {'num_edges': len(g._edges)}
    ''',
    engine='cudf')
num_edges = obj['num_edges']
print(f'num_edges: {num_edges}')

python_remote_table(*args, **kwargs)#

在远程数据集上远程运行Python代码，返回一个表格

使用最新绑定的_dataset_id，如果尚未绑定，则上传当前数据集。请注意，重新绑定edges()和nodes()的调用会重置_dataset_id的绑定。

Parameters:

code (Union[str, Callable[..., object]]) – 包含顶层函数 def task(g: Plottable) -> Union[str, Dict] 的 Python 代码。
api_token (可选[str]) – 可选的JWT令牌。如果未提供，则刷新JWT并使用它。
dataset_id (可选[str]) – 可选的dataset_id。如果未提供，将回退到self._dataset_id。如果未定义，将上传当前数据，存储该dataset_id，并针对该数据运行代码。
format (可选[FormatType]) – 获取结果的格式。默认为‘parquet’。
output_type (可选[OutputTypeGraph]) – 获取什么形状的输出。默认为‘table’。选项包括‘table’、‘nodes’和‘edges’。
engine (Literal["pandas", "cudf]) – 覆盖GFQL使用的运行模式。默认为“cudf”。
run_label (可选[str]) – 用于服务器端作业跟踪的运行的可选标签。
validate (bool) – 是否在本地测试代码，以及如果上传数据，则测试数据。默认为 true。

Return type:

任何

Example: Upload data and count the results

import graphistry
from graphistry import n, e
es = pandas.DataFrame({'src': [0,1,2], 'dst': [1,2,0]})
g1 = graphistry
    .edges(es, source='src', destination='dst')
    .upload()
assert g1._dataset_id is not None, "Successfully uploaded"
edges_df = g1.python_remote_table(
    code='''
        from typing import Any, Dict
        from graphistry import Plottable

        def task(g: Plottable) -> Dict[str, Any]:
            return g._edges
    ''',
    engine='cudf')
num_edges = len(edges_df)
print(f'num_edges: {num_edges}')

to_cudf()#

通过将任何定义的节点和边转换为cudf数据框来转换为GPU模式

当节点或边已经是cudf数据框时，它们将保持不变

Parameters:: g (Plottable) – Graphistry 对象
Returns:: Graphistry 对象
Return type:: Plottable

to_pandas()#

通过将任何定义的节点和边转换为pandas数据框来转换为CPU模式

当节点或边已经是pandas数据框时，它们将保持不变

Return type:: Plottable

折叠#

graphistry.compute.collapse.check_default_columns_present_and_coerce_to_string(g)#

帮助将COLLAPSE列设置为节点和边的数据框，同时将src、dst、node转换为dtype(str) :param g: graphistry实例

Returns:: Graphistry 实例
Parameters:: g (Plottable)

graphistry.compute.collapse.check_has_set(ndf, parent, child)#

graphistry.compute.collapse.collapse_algo(g, child, parent, attribute, column, seen)#

基本上是在拓扑感知的方式下对图属性进行糖果粉碎

检查子节点是否从父节点继承了所需的属性，我们需要检查 (start_node=parent: has_attribute , children nodes: has_attribute) 的情况 (T, T), (F, T), (T, F) 和 (F, F)，我们开始在子节点上递归折叠（或不折叠），重新分配节点和边。

如果 (T, T)，将子节点附加到 start_node，重新分配节点的名称，并使用新名称更新边表，

如果 (F, T) 开始 k 个（可能是新的）超级节点，其中 k 是 start_node 的子节点数量。开始节点保留 k 个出边。

如果 (T, F) 是集群的结束，我们保持新节点不变；继续前进

如果 (F, F); 继续前进

Parameters:

seen (字典)
g (Plottable) – graphistry 实例
child (str | int) – 开始遍历的子节点，对于第一次遍历，设置 child=parent 或反之亦然。
parent (str | int) – 开始遍历的父节点，在主调用中，此参数设置为子节点。
attribute (str | int) – 用于折叠的属性
column (str | int) – 节点数据框中用于折叠的列。

Returns:

带有折叠节点的graphistry实例。

graphistry.compute.collapse.collapse_by(self, parent, start_node, attribute, column, seen, self_edges=False, unwrap=False, verbose=True)#

在collapse.py中的主调用，通过属性折叠节点和边，并返回归一化的graphistry对象。

Parameters:

self (Plottable) – graphistry 实例
parent (str | int) – 开始遍历的父节点，在主调用中，此值设置为子节点。
start_node (str | int)
attribute (str | int) – 用于折叠的属性
column (str | int) – 节点数据框中用于折叠的列。
seen (dict) – 之前已折叠对的字典 – {n1, n2) 被视为与 (n2, n1) 不同
verbose (bool) – 布尔值，默认为 True
self_edges (bool)
unwrap (bool)

Return type:

Plottable

:返回带有折叠和标准化节点的graphistry实例。

graphistry.compute.collapse.collapse_nodes_and_edges(g, parent, child)#

断言ndf中的父节点和子节点应该被折叠成超级节点。在graphistry实例g中设置带有COLLAPSE节点的新ndf

# 这断言我们应该将父节点和子节点合并为超级节点 # 外部逻辑控制何时是这种情况 # 例如，它假设父节点已经在COLLAPSE节点的集群键中

Parameters:

g (Plottable) – graphistry 实例
parent (str | int) – node 在 column 中具有 attribute
子节点 (str | int) – 节点在列中的属性

Returns:

Graphistry 实例

graphistry.compute.collapse.get_children(g, node_id, hops=1)#

帮助程序，用于获取距离节点 node_id k跳的子节点

:返回 hops 的 graphistry 实例

Parameters:

g (Plottable)
node_id (str | int)
hops (int)

graphistry.compute.collapse.get_cluster_store_keys(ndf, node)#

在查找和添加到超级节点中的主要创新。检查节点是否是节点DataFrame的COLLAPSE列中任何collapse_node的一个段。

Parameters:

ndf (DataFrame) – 节点数据框
node (str | int) – 要查找的节点

Returns:

布尔值的DataFrame，表示wrap_key(node)是否存在于COLLAPSE列中

graphistry.compute.collapse.get_edges_in_out_cluster(g, node_id, attribute, column, directed=True)#

遍历node_id的子节点，并根据它们在节点DataFramecolumn中是否具有attribute，将它们分为集群内和集群外集合。

Parameters:

g (Plottable) – graphistry 实例
node_id (str | int) – node 在 column 中带有 attribute
属性 (str | int) – 在列上折叠的属性
列 (str | int) – 列用于折叠
directed (bool)

graphistry.compute.collapse.get_edges_of_node(g, node_id, outgoing_edges=True, hops=1)#

获取距离节点k跳的节点的边

Parameters:

g (Plottable) – graphistry 实例
node_id (str | int) – 从node查找边的起点
outgoing_edges (bool) – 布尔值，如果为真，则查找node的所有出边，默认为True
hops (int) – 从node开始的跳数，默认值 = 1

Returns:

边的DataFrame

graphistry.compute.collapse.get_new_node_name(ndf, parent, child)#

如果子节点在集群组中，则合并名称，否则从父节点和子节点创建新的父名称

Parameters:

ndf (DataFrame) – 节点数据框
parent (str | int) – node 在 column 中具有 attribute
子节点 (str | int) – 节点在列中的属性

Return type:

字符串

:返回 new_parent_name

graphistry.compute.collapse.has_edge(g, n1, n2, directed=True)#

检查 n1 和 n2 是否共享一条（有向或无向）边

Parameters:

g (Plottable) – graphistry 实例
n1 (str | int) – 检查是否与n2有边的节点
n2 (str | int) – 检查是否与n1有边的node
directed (bool) – 布尔值，如果为True，则仅检查从n1->`n2`的传出边，否则查找无向边

Returns:

布尔值，如果边存在于n1和n2之间

Return type:

布尔

graphistry.compute.collapse.has_property(g, ref_node, attribute, column)#

检查ref_node是否在具有属性的列中的节点数据框中 :param attribute: :param column: :param g: graphistry实例 :param ref_node: node 检查它是否在 column 中有 attribute

Returns:

布尔

Parameters:

g (Plottable)
ref_node (str | int)
属性 (字符串 | 整数)
列 (字符串 | 整数)

Return type:

布尔

graphistry.compute.collapse.in_cluster_store_keys(ndf, node)#

检查节点是否在nodes DataFrame的COLLAPSE列中的collapse_node中

Parameters:

ndf (DataFrame) – 节点数据框
node (str | int) – 要查找的节点

Returns:

布尔

Return type:

布尔

graphistry.compute.collapse.melt(ndf, node)#

如果在集群存储中，则减少节点，否则直接通过。例如：

node = “4” 将从 get_cluster_store_keys 获取任何序列，如 “1 2 3”、“4 3 6”，并在它们有共同条目（3）时返回 “1 2 3 4 6”。

:param ndf, 节点数据框 :param node: 要融化的节点 :returns 超级节点的新父名称

Parameters:

ndf (数据框)
节点 (字符串 | 整数)

Return type:

字符串

graphistry.compute.collapse.normalize_graph(g, self_edges=False, unwrap=False)#

在完成折叠遍历后的最后一步，移除重复项并将COLLAPSE列移动到Graphistry实例g的节点、边数据框的相应（节点、源、目标）列中。

Parameters:

g (Plottable) – graphistry 实例
self_edges (bool) – 布尔值，是否保留来自ndf、edf的重复项，默认为False
unwrap (bool) – bool, 是否使用~解包节点文本，默认为True

Returns:

最终的graphistry实例

Return type:

Plottable

graphistry.compute.collapse.reduce_key(key)#

将“1 1 2 1 2 3”转换为“1 2 3”

Parameters:: key (str | int) – 节点名称
Returns:: 删除重复项后的新节点名称
Return type:: 字符串

graphistry.compute.collapse.unpack(g)#

帮助方法，用于解包graphistry实例

例如：

ndf, edf, src, dst, node = unpack(g)

Parameters:: g (Plottable) – graphistry 实例
Returns:: 节点DataFrame，边DataFrame，源列，目标列，节点列

graphistry.compute.collapse.unwrap_key(name)#

解包节点名称：~name~ -> name

Parameters:: name (str | int) – 要解包的节点
Returns:: 未包装的节点名称
Return type:: 字符串

graphistry.compute.collapse.wrap_key(name)#

包装节点名称 -> ~name~

Parameters:: name (str | int) – 节点名称
Returns:: 包装节点名称
Return type:: 字符串

条件#

class graphistry.compute.conditional.ConditionalMixin(*args, **kwargs)#

基础类：object

conditional_graph(x, given, kind='nodes', *args, **kwargs)#

条件图 – p(x|given) = p(x, given) / p(given)

用于查找节点或边属性的条件概率

返回的数据框在每列上求和为1

Parameters:

x – 目标列
given – 依赖列
kind – ‘nodes’ 或 ‘edges’
args/kwargs – g.bind(…) 的额外参数

Returns:

一个带有条件图的graphistry实例边由条件概率加权。边在x和given之间，请记住 g._edges.columns = [given, x, _probs]

conditional_probs(x, given, kind='nodes', how='index')#

生成给定 y 条件下 x 的条件概率的密集矩阵

Args:: x: 给定列 y=given 时感兴趣的列变量 given : 固定为常数的变量 df pd.DataFrame: 数据框 how (str, 可选): ‘column’ 或 ‘index’ 之一。默认为 ‘index’。 kind (str, 可选): ‘nodes’ 或 ‘edges’。默认为 ‘nodes’。
Returns:: pd.DataFrame: 给定列 y 的条件下 x 的条件概率作为类似数据框的密集数组

graphistry.compute.conditional.conditional_probability(x, given, df)#

conditional probability function over categorical variables: p(x | 给定) = p(x, 给定)/p(给定)
Args:: x: 给定列 'given' 的感兴趣的列变量 given: 需要保持不变的变量 df: 包含列 [given, x] 的数据框
Returns:: pd.DataFrame: 给定列 'given' 时 x 的条件概率

Parameters:: df (数据框)

graphistry.compute.conditional.probs(x, given, df, how='index')#

生成一个密集矩阵，表示给定y=given时x的条件概率

Args:: x: 给定列'y'的感兴趣的列变量 given: 保持不变的变量 df pd.DataFrame: 数据框 how (str, 可选): 'column' 或 'index' 之一。默认为 'index'。
Returns:: pd.DataFrame: 给定列‘y’的x的条件概率，作为类似数据框的密集数组

Parameters:: df (数据框)

按字典过滤#

graphistry.compute.filter_by_dict.filter_by_dict(df, filter_dict=None, engine=EngineAbstract.AUTO)#

返回df，其中行匹配filter_dict中的所有值

Parameters:

df (任意)
filter_dict (字典 | 无)
engine (EngineAbstract | str)

Return type:

任何

graphistry.compute.filter_by_dict.filter_edges_by_dict(self, filter_dict, engine=EngineAbstract.AUTO)#

过滤边以匹配filter_dict中的所有值

Parameters:

self (Plottable)
filter_dict (字典)
engine (EngineAbstract | str)

Return type:

Plottable

graphistry.compute.filter_by_dict.filter_nodes_by_dict(self, filter_dict, engine=EngineAbstract.AUTO)#

过滤节点以匹配filter_dict中的所有值

Parameters:

self (Plottable)
filter_dict (字典)
engine (EngineAbstract | str)

Return type:

Plottable

Compute API 参考

目录

计算API参考#

ComputeMixin 模块#

折叠#

条件#

按字典过滤#