Vineyard上的内存不可变图¶
Vineyard 是一个分布式不可变内存数据管理器,用作GraphScope中不可变图的存储后端。Vineyard通过内存映射提供零拷贝数据共享功能,GraphScope中的不同计算引擎可以在同一个vineyard集群上运行,从而高效共享图数据。
Vineyard中的图¶
Vineyard支持不可变属性图,并将其抽象为vineyard::ArrowFragment类,该类由边的CSR结构组成,并使用表来存储边和顶点属性。在ArrowFragment基础上,vineyard将分布式图抽象为vineyard::ArrowFragmentGroup,它由分布在集群中的一组片段组成。
将图数据加载到Vineyard¶
Vineyard可以作为一个独立服务部署,也可以与GraphScope一起启动。
提供了一个命令行工具vineyard-graph-loader用于将片段加载到vineyard中。它首先接受一个可选参数--socket ,该参数指定加载器将连接的IPC套接字。如果省略该参数,则会从环境变量VINEYARD_IPC_SOCKET中解析该值。它接受一组命令行参数或JSON文件作为配置。
$ vineyard-graph-loader --help
Usage: loading vertices and edges as vineyard graph.
- ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] \
<e_label_num> <efiles...> <v_label_num> <vfiles...> \
[directed] [generate_eid] [retain_oid] [string_oid]
- or: ./vineyard-graph-loader [--socket <vineyard-ipc-socket>] --config <config.json>
The config is a json file and should look like
{
"vertices": [
{
"data_path": "....",
"label": "...",
"options": "...."
},
...
],
"edges": [
{
"data_path": "",
"label": "",
"src_label": "",
"dst_label": "",
"options": ""
},
...
],
"directed": 1, # 0 or 1
"generate_eid": 1, # 0 or 1
"retain_oid": 1, # 0 or 1
"string_oid": 0, # 0 or 1
"local_vertex_map": 0 # 0 or 1
}%
指定如何构建图的一些选项包括:
directed: 表示该图是有向图还是无向图。generate_eid: 是否为每条边生成全局唯一的边ID。retain_oid: 是否保留原始顶点ID到最终顶点的属性表中。string_oid: 顶点ID是否为字符串类型。local_vertex_map: 是否在图构建过程中使用本地顶点映射,通常用于优化内存使用。
使用vineyard-graph-loader加载现代图可以通过以下方式完成:
使用命令行参数
vineyard-graph-loader接受一系列命令行参数来指定边文件和顶点文件,例如:$ ./vineyard-graph-loader 2 "modern_graph/knows.csv#header_row=true&src_label=person&dst_label=person&label=knows&delimiter=|" \ "modern_graph/created.csv#header_row=true&src_label=person&dst_label=software&label=created&delimiter=|" \ 2 "modern_graph/person.csv#header_row=true&label=person&delimiter=|" \ "modern_graph/software.csv#header_row=true&label=software&delimiter=|"
使用JSON配置文件
$ ./vineyard-graph-loader --config config.json
JSON配置文件示例如下(以"现代图"为例):
{ "vertices": [ { "data_path": "modern_graph/person.csv", "label": "person", "options": "header_row=true&delimiter=|" }, { "data_path": "modern_graph/software.csv", "label": "software", "options": "header_row=true&delimiter=|" } ], "edges": [ { "data_path": "modern_graph/knows.csv", "label": "knows", "src_label": "person", "dst_label": "person", "options": "header_row=true&delimiter=|" }, { "data_path": "modern_graph/created.csv", "label": "created", "src_label": "person", "dst_label": "software", "options": "header_row=true&delimiter=|" } ], "directed": 1, "generate_eid": 1, "string_oid": 0, "local_vertex_map": 0 }
使用已加载的图¶
加载到vineyard后,可以使用vineyard的IPCClient访问已加载的分片:
void WriteOut(vineyard::Client& client, const grape::CommSpec& comm_spec,
vineyard::ObjectID fragment_group_id) {
LOG(INFO) << "Loaded graph to vineyard: " << fragment_group_id;
std::shared_ptr<vineyard::ArrowFragmentGroup> fg =
std::dynamic_pointer_cast<vineyard::ArrowFragmentGroup>(
client.GetObject(fragment_group_id));
for (const auto& pair : fg->Fragments()) {
LOG(INFO) << "[frag-" << pair.first << "]: " << pair.second;
}
// NB: only retrieve local fragments.
auto locations = fg->FragmentLocations();
for (const auto& pair : fg->Fragments()) {
if (locations.at(pair.first) != client.instance_id()) {
continue;
}
auto frag_id = pair.second;
Traverse(client, frag_id);
}
}
可以使用vineyard::ArrowFragment的API来遍历本地片段:
void Traverse(vineyard::Client& client, vineyard::ObjectID frag_id) {
auto frag = std::dynamic_pointer_cast<GraphType>(client.GetObject(frag_id));
LOG(INFO) << "graph total node number: " << frag->GetTotalNodesNum();
LOG(INFO) << "fragment edge number: " << frag->GetEdgeNum();
LOG(INFO) << "fragment in edge number: " << frag->GetInEdgeNum();
LOG(INFO) << "fragment out edge number: " << frag->GetOutEdgeNum();
for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
LOG(INFO) << "vertex table: " << vlabel << " -> "
<< frag->vertex_data_table(vlabel)->schema()->ToString();
}
for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
LOG(INFO) << "edge table: " << elabel << " -> "
<< frag->edge_data_table(elabel)->schema()->ToString();
}
LOG(INFO) << "--------------- consolidate vertex/edge table columns ...";
if (frag->vertex_data_table(0)->columns().size() >= 4) {
for (LabelType vlabel = 0; vlabel < frag->vertex_label_num(); ++vlabel) {
LOG(INFO) << "vertex table: " << vlabel << " -> "
<< frag->vertex_data_table(vlabel)->schema()->ToString();
}
}
if (frag->edge_data_table(0)->columns().size() >= 4) {
for (LabelType elabel = 0; elabel < frag->edge_label_num(); ++elabel) {
LOG(INFO) << "edge table: " << elabel << " -> "
<< frag->edge_data_table(elabel)->schema()->ToString();
}
}
}