会话¶
会话对象¶
- class graphscope.Session(config: Config | str | None = None, api_client: ApiClient | None = None, **kw)[源代码]¶
用于与GraphScope图计算服务集群交互的类。
一个
Session对象封装了执行/评估Operation对象的环境。一个会话可能拥有资源。当这些资源不再需要时,释放它们非常重要。为此,请在会话上调用
close()方法。一个会话可以通过
as_default()将自己注册为默认会话,之后的所有操作都将使用该默认会话。当会话关闭时,它会自动注销作为默认会话的状态。以下示例展示了其用法:
>>> import graphscope as gs >>> # use session object explicitly >>> sess = gs.session() >>> g = sess.g() >>> pg = g.project(vertices={'v': []}, edges={'e': ['dist']}) >>> r = gs.sssp(g, 4) >>> sess.close() >>> # or use a session as default >>> sess = gs.session().as_default() >>> g = gs.g() >>> pg = g.project(vertices={'v': []}, edges={'e': ['dist']}) >>> r = gs.sssp(pg, 4) >>> sess.close()
我们支持通过以下方式设置服务集群并创建RPC会话:
GraphScope图计算服务运行在由kubernetes管理的集群中。
>>> s = graphscope.session()
此外,
Session提供了多个关键字参数供用户定义集群。 您可以使用参数k8s_gs_image来指定所有引擎pod的镜像, 使用参数k8s_engine_cpu或k8s_engine_mem来指定资源。更多参数详情, 您可以在__init__()方法中找到。>>> s = graphscope.session( ... k8s_vineyard_cpu=0.1, ... k8s_vineyard_mem="256Mi", ... vineyard_shared_mem="4Gi", ... k8s_engine_cpu=0.1, ... k8s_engine_mem="256Mi")
或者所有参数可以通过一个json配置文件或配置字典提供。
>>> s = graphscope.session(config='/tmp/config.yaml') >>> # 或者 >>> s = graphscope.session(config={'k8s_engine_cpu': 5, 'k8s_engine_mem': '5Gi'})
- __init__(config: Config | str | None = None, api_client: ApiClient | None = None, **kw)[源代码]¶
构建一个新的GraphScope会话。
- Parameters:
config (dict 或 str, 可选) – 关于如何启动GraphScope实例的配置字典或文件。 如果是字符串类型,会将其识别为路径并读取配置文件来构建会话(如果文件存在)。如果未指定,将使用全局默认配置。 注意这会覆盖显式指定的参数。默认为None。
api_client – 用于Kubernetes集群的kube api客户端
kw –
Configurable keys. For backward compatibility. For more details, see Config class in config.py addr (str, optional): The endpoint of a pre-launched GraphScope instance with ‘<ip>:<port>’ format.
A new session id will be generated for each session connection.
- mode (str, optional): optional values are eager and lazy. Defaults to eager.
- Eager execution is a flexible platform for research and experimentation, it provides:
An intuitive interface: Quickly test on small data. Easier debugging: Call ops directly to inspect running models and test changes.
- Lazy execution means GraphScope does not process the data till it has to.
It just gathers all the information to a DAG that we feed into it, and processes only when we execute
sess.run(fetches)
- cluster_type (str, optional): Deploy GraphScope instance on hosts or k8s cluster. Defaults to k8s.
Available options: “k8s” and “hosts”. Note that only support deployed on localhost with hosts mode.
num_workers (int, optional): The number of workers to launch GraphScope engine. Defaults to 2.
- preemptive (bool, optional): If True, GraphScope instance will treat resource params
(e.g. k8s_coordinator_cpu) as limits and provide the minimum available value as requests, but this will make pod has a Burstable QOS, which can be preempted by other pods with high QOS. Otherwise, it will set both requests and limits with the same value.
- k8s_namespace (str, optional): Contains the namespace to create all resource inside.
If param missing, it will try to read namespace from kubernetes context, or a random namespace will be created and deleted if namespace not exist. Defaults to None.
- k8s_service_type (str, optional): Type determines how the GraphScope service is exposed.
Valid options are NodePort, and LoadBalancer. Defaults to NodePort.
k8s_image_registry (str, optional): The GraphScope image registry.
k8s_image_repository (str, optional): The GraphScope image repository.
k8s_image_tag (str, optional): The GraphScope image tag.
k8s_image_pull_policy (str, optional): Kubernetes image pull policy. Defaults to “IfNotPresent”.
k8s_image_pull_secrets (List[str], optional): A list of secret name used to authorize pull image.
k8s_vineyard_image (str, optional): The image of vineyard.
- k8s_vineyard_deployment (str, optional): The name of vineyard deployment to use. GraphScope will try to
discovery the deployment from kubernetes cluster, then use it if exists, and fallback to launching a bundled vineyard container otherwise.
k8s_vineyard_cpu (float, optional): Number of CPU cores request for vineyard container. Defaults to 0.2.
k8s_vineyard_mem (str, optional): Number of memory request for vineyard container. Defaults to ‘256Mi’
k8s_engine_cpu (float, optional): Number of CPU cores request for engine container. Defaults to 1.
k8s_engine_mem (str, optional): Number of memory request for engine container. Defaults to ‘4Gi’.
k8s_coordinator_cpu (float, optional): Number of CPU cores request for coordinator. Defaults to 0.5.
k8s_coordinator_mem (str, optional): Number of memory request for coordinator. Defaults to ‘512Mi’.
- etcd_addrs (str, optional): The addr of external etcd cluster,
with formats like ‘etcd01:port,etcd02:port,etcd03:port’
- k8s_mars_worker_cpu (float, optional):
Minimum number of CPU cores request for Mars worker container. Defaults to 0.2.
- k8s_mars_worker_mem (str, optional):
Minimum number of memory request for Mars worker container. Defaults to ‘4Mi’.
- k8s_mars_scheduler_cpu (float, optional):
Minimum number of CPU cores request for Mars scheduler container. Defaults to 0.2.
- k8s_mars_scheduler_mem (str, optional):
Minimum number of memory request for Mars scheduler container. Defaults to ‘4Mi’.
- k8s_coordinator_pod_node_selector (dict, optional):
Node selector to the coordinator pod on k8s. Default is None. See also: https://tinyurl.com/3nx6k7ph
- k8s_engine_pod_node_selector = None
Node selector to the engine pod on k8s. Default is None. See also: https://tinyurl.com/3nx6k7ph
- with_mars (bool, optional):
Launch graphscope with Mars. Defaults to False.
- enabled_engines (str, optional):
Select a subset of engines to enable. Only make sense in k8s mode.
- with_dataset (bool, optional):
Create a container and mount aliyun demo dataset bucket to the path /dataset.
- k8s_volumes (dict, optional): A dict of k8s volume which represents a directory containing data,
accessible to the containers in a pod. Defaults to {}.
For example, you can mount host path with:
- k8s_volumes = {
- “my-data”: {
“type”: “hostPath”, “field”: {
”path”: “<path>”, “type”: “Directory”
}, “mounts”: [
- {
“mountPath”: “<path1>”
}, {
”mountPath”: “<path2>”
}
]
}
}
Or you can mount PVC with:
- k8s_volumes = {
- “my-data”: {
“type”: “persistentVolumeClaim”, “field”: {
”claimName”: “your-pvc-name”
}, “mounts”: [
- {
“mountPath”: “<path1>”
}
]
}
}
Also, you can mount a single volume with:
- k8s_volumes = {
- “my-data”: {
“type”: “hostPath”, “field”: {xxx}, “mounts”: {
”mountPath”: “<path1>”
}
}
}
- timeout_seconds (int, optional): For waiting service ready (or waiting for delete if
k8s_waiting_for_delete is True).
- dangling_timeout_seconds (int, optional): After seconds of client disconnect,
coordinator will kill this graphscope instance. Defaults to 600. Expect this value to be greater than 5 (heartbeat interval). Disable dangling check by setting -1.
- k8s_deploy_mode (str, optional): the deploy mode of engines on the kubernetes cluster. Default to eager.
eager: create all engine pods at once lazy: create engine pods when called
k8s_waiting_for_delete (bool, optional): Waiting for service delete or not. Defaults to False.
- k8s_client_config (dict, optional):
config_file: Name of the kube-config file. Provide configurable parameters for connecting to remote k8s e.g. “~/.kube/config”
- reconnect (bool, optional): When connecting to a pre-launched GraphScope cluster with
addr, the connect request would be rejected with there is still an existing session connected. There are cases where the session still exists and user’s client has lost connection with the backend, e.g., in a jupyter notebook. We have a
dangling_timeout_secondsfor it, but a more deterministic behavior would be better.If
reconnectis True, the existing session will be reused. It is the user’s responsibility to ensure there’s no such an active client.Defaults to
False.
- Raises:
TypeError – 如果给定的参数组合无效且无法用于创建GraphScope会话。
- as_default()[源代码]¶
获取一个上下文管理器,将该对象设为默认会话。
此方法在构建Session时使用,它将立即将自身安装为默认会话。
- Raises:
ValueError – 如果当前上下文中已存在默认会话。
- Returns:
一个上下文管理器,使用此会话作为默认会话。
- property engine_config¶
以json格式显示与会话关联的引擎配置。
- g(incoming_data=None, oid_type='int64', vid_type='uint64', directed=True, generate_eid=True, retain_oid=True, vertex_map='global', compact_edges=False, use_perfect_hash=False) 图 | GraphDAGNode[源代码]¶
在默认会话上构建一个GraphScope图对象。
当未找到默认会话时,它将启动并设置一个默认会话。
查看参数详情请见
graphscope.framework.graph.GraphDAGNode- Returns:
在即时模式下评估。
- Return type:
示例:
>>> import graphscope >>> g = graphscope.g() >>> import graphscope >>> sess = graphscope.session() >>> g = sess.g() # creating graph on the session "sess"
- get_vineyard_object_mapping_table()[源代码]¶
获取vineyard对象映射表 从旧对象ID到新对象ID 在将图存储和恢复到kubernetes集群上的pvc期间。
- graphlearn(graph, nodes=None, edges=None, gen_labels=None)[源代码]¶
启动图学习引擎。
- Parameters:
graph (
graphscope.framework.graph.GraphDAGNode) – 用于创建学习实例的图。nodes (list, optional) – 用于GNN训练的节点类型列表,列表元素可以是"node_label"或(node_label, features)。如果列表元素是元组且包含选定的特征列表,则将使用选定的特征列表进行训练。默认为None,表示使用所有类型的节点进行GNN训练。
edges (list, optional) – 用于GNN训练的边类型列表。我们使用(src_label, edge_label, dst_label)来指定一种边类型。默认为None,表示使用所有类型的边进行GNN训练。
gen_labels (list, optional) – 为监督式GNN训练设置节点和边的别名标签,并从原始图中提取训练/验证/测试数据集。具体说明请参阅下方示例。
示例
>>> # Assume the input graph contains one label node `paper` and one edge label `link`. >>> features = ["weight", "name"] # use properties "weight" and "name" as features >>> lg = sess.graphlearn( graph, nodes=[("paper", features)]) # use "paper" node and features for training edges=[("paper", "links", "paper")] # use the `paper->links->papers` edge type for training gen_labels=[ # split "paper" nodes into 100 pieces, and uses random 75 pieces (75%) as training dataset ("train", "paper", 100, (0, 75)), # split "paper" nodes into 100 pieces, and uses random 10 pieces (10%) as validation dataset ("val", "paper", 100, (75, 85)), # split "paper" nodes into 100 pieces, and uses random 15 pieces (15%) as test dataset ("test", "paper", 100, (85, 100)), ] ) Note that the training, validation and test datasets are not overlapping. And for unsupervised learning: >>> lg = sess.graphlearn( graph, nodes=[("paper", features)]) # use "paper" node and features for training edges=[("paper", "links", "paper")] # use the `paper->links->papers` edge type for training gen_labels=[ # split "paper" nodes into 100 pieces, and uses all pieces as training dataset ("train", "paper", 100, (0, 100)), ] )
- property info¶
以json格式显示与会话关联的所有资源信息。
- interactive(graph, params=None, with_cypher=False)[源代码]¶
获取一个交互式引擎处理器,用于执行gremlin和cypher查询。
它将返回一个
graphscope.interactive.query.InteractiveQuery的实例,>>> # close and recreate InteractiveQuery. >>> interactive_query = sess.interactive(g) >>> interactive_query.close() >>> interactive_query = sess.interactive(g)
- Parameters:
graph (
graphscope.framework.graph.GraphDAGNode) – 用于创建交互式实例的图对象。params – 一个包含GIE实例配置的字典。
- Raises:
InvalidArgumentError –
graph不是一个属性图。
- Returns:
InteractiveQuery用于执行gremlin和cypher查询。
- Return type:
- restore_from_pvc(path: str, pvc_name: str)[源代码]¶
从给定的PVC路径中恢复图数据。 注意,在调用此函数之前,需要将KUBECONFIG环境变量 设置为您的kubeconfig文件路径。
- Parameters:
path – PVC绑定到的PV中的路径。
pvc_name – PVC的名称。
- Raises:
运行时错误 – 如果集群类型不是Kubernetes。
- store_to_pvc(graphIDs, path: str, pvc_name: str)[源代码]¶
将给定的图ID存储到指定路径下的持久卷声明(PVC)中。 此外,如果您希望将不同会话的图存储到同一个持久卷(PV)中, 建议您首先为不同会话创建不同的持久卷声明(PVC)。
注意,在调用此函数之前,需要将KUBECONFIG环境变量设置为kubeconfig文件的路径。同时请确保PVC已绑定到PV,且PV的容量足以存储图数据。
该方法使用vineyardctl创建一个Kubernetes作业来序列化选定的图。更多信息请参阅vineyardctl文档。
https://github.com/v6d-io/v6d/tree/main/k8s/cmd#vineyardctl-deploy-backup-job
- Parameters:
graph_ids – 要存储的图ID列表。 支持的类型: - list: vineyard.ObjectID 或 graphscope.Graph 的列表
path – PVC绑定到的PV中的路径。
pvc_name – PVC的名称。
- Raises:
运行时错误 – 如果集群类型不是Kubernetes。
会话函数¶
|
|
返回当前上下文的默认会话。 |
|
如果当前上下文中存在默认会话,则为True。 |
|
|
设置指定选项的值。 |
|
在默认会话上构建一个GraphScope图对象。 |
|
|
|
创建一个图学习引擎。 |