OceanBase 向量存储
OceanBase 数据库是一个分布式关系型数据库。它完全由蚂蚁集团开发。OceanBase 数据库构建在普通服务器集群之上。基于 Paxos 协议及其分布式架构,OceanBase 数据库提供高可用性和线性可扩展性。OceanBase 数据库不依赖于特定的硬件架构。
本笔记本详细介绍了如何在LlamaIndex中使用OceanBase向量存储功能。
%pip install llama-index-vector-stores-oceanbase%pip install llama-index# choose dashscope as embedding and llm model, your can also use default openai or other model to test%pip install llama-index-embeddings-dashscope%pip install llama-index-llms-dashscope使用Docker部署独立的OceanBase服务器
Section titled “Deploy a standalone OceanBase server with docker”%docker run --name=ob433 -e MODE=slim -p 2881:2881 -d oceanbase/oceanbase-ce:4.3.3.0-100000142024101215创建 ObVecClient
Section titled “Creating ObVecClient”from pyobvector import ObVecClient
client = ObVecClient()client.perform_raw_text_sql( "ALTER SYSTEM ob_vector_memory_limit_percentage = 30")配置dashscope嵌入模型和LLM。
# set Embbeding modelimport osfrom llama_index.core import Settingsfrom llama_index.embeddings.dashscope import DashScopeEmbedding
# Global SettingsSettings.embed_model = DashScopeEmbedding()
# config llm modelfrom llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
dashscope_llm = DashScope( model_name=DashScopeGenerationModels.QWEN_MAX, api_key=os.environ.get("DASHSCOPE_API_KEY", ""),)from llama_index.core import ( SimpleDirectoryReader, load_index_from_storage, VectorStoreIndex, StorageContext,)from llama_index.vector_stores.oceanbase import OceanBaseVectorStore下载数据 & 加载数据
!mkdir -p 'data/paul_graham/'!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'# load documentsdocuments = SimpleDirectoryReader("./data/paul_graham/").load_data()oceanbase = OceanBaseVectorStore( client=client, dim=1536, drop_old=True, normalize=True,)
storage_context = StorageContext.from_defaults(vector_store=oceanbase)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)# set Logging to DEBUG for more detailed outputsquery_engine = index.as_query_engine(llm=dashscope_llm)res = query_engine.query("What did the author do growing up?")res.response'Growing up, the author worked on two main activities outside of school: writing and programming. They wrote short stories, which they admits were not particularly good, lacking plot but containing characters with strong emotions. They also started programming at a young age, initially on an IBM 1401 computer using an early version of Fortran, though they found it challenging due to the limitations of punch card input and their lack of data to process. Their programming journey真正 took off when microcomputers became available, allowing them to write more interactive programs such as games, a rocket flight predictor, and a simple word processor.'OceanBase 向量存储支持在查询时以 =、>、<、!=、>=、<=、in、not in、like、IS NULL 的形式进行元数据过滤。
from llama_index.core.vector_stores import ( MetadataFilters, MetadataFilter,)
query_engine = index.as_query_engine( llm=dashscope_llm, filters=MetadataFilters( filters=[ MetadataFilter(key="book", value="paul_graham", operator="!="), ] ), similarity_top_k=10,)
res = query_engine.query("What did the author learn?")res.response'Empty Response'oceanbase.delete(documents[0].doc_id)
query_engine = index.as_query_engine(llm=dashscope_llm)res = query_engine.query("What did the author do growing up?")res.response'Empty Response'