Faiss 在 GPU 上的运行

Faiss 可以非常方便地调用 NVIDIA GPU 来加速向量检索任务。

声明 GPU 资源

首先，需要声明一个 GPU 资源对象，它代表 GPU 上的一块内存资源。

Python 示例

res = faiss.StandardGpuResources()  # 使用单个 GPU

C++ 示例

faiss::gpu::StandardGpuResources res;  // 使用单个 GPU

构建 GPU 索引

接下来，使用上面创建的 GPU 资源来构建一个 GPU 索引。

Python 示例

# 构建一个平坦（即未压缩的 CPU）索引
index_flat = faiss.IndexFlatL2(d)
# 转换为 GPU 索引
gpu_index_flat = faiss.index_cpu_to_gpu(res, 0, index_flat)

C++ 示例

faiss::gpu::GpuIndexFlatL2 gpu_index_flat(&res, d);

备注

一个 GPU 资源对象可以被多个索引共用，只要它们不会并发执行查询操作即可。

使用 GPU 索引

GPU 索引的使用方法和 CPU 索引完全相同：

Python 示例

gpu_index_flat.add(xb)         # 向索引中添加向量
print(gpu_index_flat.ntotal)

k = 4                          # 检索 4 个最近邻
D, I = gpu_index_flat.search(xq, k)  # 执行检索
print(I[:5])                   # 打印前 5 个查询的近邻结果
print(I[-5:])                  # 打印最后 5 个查询的近邻结果

C++ 示例

gpu_index_flat.add(nb, xb);  // 添加向量到索引
printf("ntotal = %ld\n", gpu_index_flat.ntotal);

int k = 4;
{   // 检索 xq
    idx_t *I = new idx_t[k * nq];
    float *D = new float[k * nq];

    gpu_index_flat.search(nq, xq, k, D, I);

    // 打印前 5 个查询的结果
    printf("I (5 first results)=\n");
    for(int i = 0; i < 5; i++) {
        for(int j = 0; j < k; j++)
            printf("%5ld ", I[i * k + j]);
        printf("\n");
    }

    // 打印最后 5 个查询的结果
    printf("I (5 last results)=\n");
    for(int i = nq - 5; i < nq; i++) {
        for(int j = 0; j < k; j++)
            printf("%5ld ", I[i * k + j]);
        printf("\n");
    }

    delete [] I;
    delete [] D;
}

检索结果说明

CPU 版本与 GPU 版本的检索结果是一致的。需要注意的是，如果数据集很小，性能提升可能并不明显。

提示

GPU 加速在大规模数据集（如百万级及以上）时效果更为明显。在小数据集上，数据传输和初始化的开销会抵消掉部分计算加速带来的好处。

使用多个 GPU

如果有多块 GPU，可利用它们进一步提升性能。基本做法是声明多个 GPU 资源对象。对于 Python 用户，可以通过 index_cpu_to_all_gpus 工具函数简化操作。

Python 示例

ngpus = faiss.get_num_gpus()

print("number of GPUs:", ngpus)

cpu_index = faiss.IndexFlatL2(d)

gpu_index = faiss.index_cpu_to_all_gpus(  # 在所有 GPU 上构建索引
    cpu_index
)

gpu_index.add(xb)              # 添加向量
print(gpu_index.ntotal)

k = 4                          # 检索 4 个最近邻
D, I = gpu_index.search(xq, k) # 执行检索
print(I[:5])                   # 前 5 个查询的近邻结果
print(I[-5:])                  # 最后 5 个查询的近邻结果

C++ 示例

int ngpus = faiss::gpu::getNumDevices();

printf("Number of GPUs: %d\n", ngpus);

std::vector<faiss::gpu::GpuResources*> res;
std::vector<int> devs;
for(int i = 0; i < ngpus; i++) {
    res.push_back(new faiss::gpu::StandardGpuResources);
    devs.push_back(i);
}

faiss::IndexFlatL2 cpu_index(d);

faiss::Index *gpu_index =
    faiss::gpu::index_cpu_to_gpu_multiple(
        res,
        devs,
        &cpu_index
    );

printf("is_trained = %s\n", gpu_index->is_trained ? "true" : "false");
gpu_index->add(nb, xb);  // 向索引中添加向量
printf("ntotal = %ld\n", gpu_index->ntotal);

int k = 4;

{   // 检索 xq
    idx_t *I = new idx_t[k * nq];
    float *D = new float[k * nq];

    gpu_index->search(nq, xq, k, D, I);

    // 打印前 5 个查询的结果
    printf("I (5 first results)=\n");
    for(int i = 0; i < 5; i++) {
        for(int j = 0; j < k; j++)
            printf("%5ld ", I[i * k + j]);
        printf("\n");
    }

    // 打印最后 5 个查询的结果
    printf("I (5 last results)=\n");
    for(int i = nq - 5; i < nq; i++) {
        for(int j = 0; j < k; j++)
            printf("%5ld ", I[i * k + j]);
        printf("\n");
    }

    delete [] I;
    delete [] D;
}

delete gpu_index;

for(int i = 0; i < ngpus; i++) {
    delete res[i];
}

important

对于多 GPU 部署，建议均匀分配数据和计算任务到所有 GPU，充分利用硬件资源，提高查询并发能力和吞吐量。

声明 GPU 资源​

Python 示例​

C++ 示例​

构建 GPU 索引​

Python 示例​

C++ 示例​

使用 GPU 索引​

Python 示例​

C++ 示例​

检索结果说明​

使用多个 GPU​

Python 示例​

C++ 示例​

相关名词解释​

声明 GPU 资源

Python 示例

C++ 示例

构建 GPU 索引

Python 示例

C++ 示例

使用 GPU 索引

Python 示例

C++ 示例

检索结果说明

使用多个 GPU

Python 示例

C++ 示例

相关名词解释