节点后处理器
节点后处理器是一组模块,它们接收一组节点,在返回节点前应用某种转换或过滤操作。
在LlamaIndex中,节点后处理器最常用于查询引擎中,在节点检索步骤之后和响应合成步骤之前应用。
LlamaIndex 提供了多个可直接使用的节点后处理器,同时也提供了一个简单的 API 用于添加您自定义的后处理器。
以下是一个使用节点后处理器的示例:
from llama_index.core.postprocessor import SimilarityPostprocessorfrom llama_index.postprocessor.cohere_rerank import CohereRerankfrom llama_index.core.data_structs import Nodefrom llama_index.core.schema import NodeWithScore
nodes = [ NodeWithScore(node=Node(text="text1"), score=0.7), NodeWithScore(node=Node(text="text2"), score=0.8),]
# similarity postprocessor: filter nodes below 0.75 similarity scoreprocessor = SimilarityPostprocessor(similarity_cutoff=0.75)filtered_nodes = processor.postprocess_nodes(nodes)
# cohere rerank: rerank nodes given query using trained modelreranker = CohereRerank(api_key="<COHERE_API_KEY>", top_n=2)reranker.postprocess_nodes(nodes, query_str="<user_query>")请注意,postprocess_nodes 可以接受 query_str 或 query_bundle(QueryBundle),但不能同时接受两者。
最常见的情况是,节点后处理器将在查询引擎中使用,它们应用于从检索器返回的节点,并在响应合成步骤之前执行。
from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core.postprocessor import TimeWeightedPostprocessor
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine( node_postprocessors=[ TimeWeightedPostprocessor( time_decay=0.5, time_access_refresh=False, top_k=1 ) ])
# all node post-processors will be applied during each queryresponse = query_engine.query("query string")或用作独立对象以过滤检索到的节点:
from llama_index.core.postprocessor import SimilarityPostprocessor
nodes = index.as_retriever().retrieve("test query str")
# filter nodes below 0.75 similarity scoreprocessor = SimilarityPostprocessor(similarity_cutoff=0.75)filtered_nodes = processor.postprocess_nodes(nodes)与您自己的节点配合使用
Section titled “Using with your own nodes”正如你可能注意到的,后处理器接收 NodeWithScore 对象作为输入,这只是一个包含 Node 和 score 值的包装类。
from llama_index.core.postprocessor import SimilarityPostprocessorfrom llama_index.core.data_structs import Nodefrom llama_index.core.schema import NodeWithScore
nodes = [ NodeWithScore(node=Node(text="text"), score=0.7), NodeWithScore(node=Node(text="text"), score=0.8),]
# filter nodes below 0.75 similarity scoreprocessor = SimilarityPostprocessor(similarity_cutoff=0.75)filtered_nodes = processor.postprocess_nodes(nodes)(custom-node-postprocessor)=
基类是 BaseNodePostprocessor,其API接口非常简单:
class BaseNodePostprocessor: """Node postprocessor."""
@abstractmethod def _postprocess_nodes( self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] ) -> List[NodeWithScore]: """Postprocess nodes."""一个虚拟节点后处理器只需几行代码即可实现:
from llama_index.core import QueryBundlefrom llama_index.core.postprocessor.types import BaseNodePostprocessorfrom llama_index.core.schema import NodeWithScore
class DummyNodePostprocessor(BaseNodePostprocessor): def _postprocess_nodes( self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] ) -> List[NodeWithScore]: # subtracts 1 from the score for n in nodes: n.score -= 1
return nodes查看完整的模块列表获取更多详情。