索引

聊天响应模式 #

基础类: str, Enum

在Agent._chat中切换等待/流式传输的标志。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

class ChatResponseMode(str, Enum):
    """Flag toggling waiting/streaming in `Agent._chat`."""

    WAIT = "wait"
    STREAM = "stream"

AgentChatResponse `dataclass` #

智能体聊天响应。

参数:

名称	类型	描述	默认值
`response`	`str`		`''`
`sources`	`List[ToolOutput]`	内置可变序列。如果没有给出参数，构造函数将创建一个新的空列表。如果指定了参数，则必须是一个可迭代对象。	`<dynamic>`
`source_nodes`	`List[NodeWithScore]`	内置可变序列。如果没有提供参数，构造函数将创建一个新的空列表。如果指定了参数，则必须是一个可迭代对象。	`<dynamic>`
`is_dummy_stream`	`bool`		`False`
`metadata`	`Dict[str, Any] \| None`		`None`

Source code in llama-index-core/llama_index/core/chat_engine/types.py

@dataclass
class AgentChatResponse:
    """Agent chat response."""

    response: str = ""
    sources: List[ToolOutput] = field(default_factory=list)
    source_nodes: List[NodeWithScore] = field(default_factory=list)
    is_dummy_stream: bool = False
    metadata: Optional[Dict[str, Any]] = None

    def set_source_nodes(self) -> None:
        if self.sources and not self.source_nodes:
            for tool_output in self.sources:
                if isinstance(tool_output.raw_output, (Response, StreamingResponse)):
                    self.source_nodes.extend(tool_output.raw_output.source_nodes)

    def __post_init__(self) -> None:
        self.set_source_nodes()

    def __str__(self) -> str:
        return self.response

    @property
    def response_gen(self) -> Generator[str, None, None]:
        """Used for fake streaming, i.e. with tool outputs."""
        if not self.is_dummy_stream:
            raise ValueError(
                "response_gen is only available for streaming responses. "
                "Set is_dummy_stream=True if you still want a generator."
            )

        for token in self.response.split(" "):
            yield token + " "
            time.sleep(0.1)

    async def async_response_gen(self) -> AsyncGenerator[str, None]:
        """Used for fake streaming, i.e. with tool outputs."""
        if not self.is_dummy_stream:
            raise ValueError(
                "response_gen is only available for streaming responses. "
                "Set is_dummy_stream=True if you still want a generator."
            )

        for token in self.response.split(" "):
            yield token + " "
            await asyncio.sleep(0.1)

response_gen `property` #

response_gen: Generator[str, None, None]

用于模拟流式传输，例如与工具输出结合使用。

async_response_gen `async` #

async_response_gen() -> AsyncGenerator[str, None]

用于模拟流式传输，例如与工具输出一起使用。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

async def async_response_gen(self) -> AsyncGenerator[str, None]:
    """Used for fake streaming, i.e. with tool outputs."""
    if not self.is_dummy_stream:
        raise ValueError(
            "response_gen is only available for streaming responses. "
            "Set is_dummy_stream=True if you still want a generator."
        )

    for token in self.response.split(" "):
        yield token + " "
        await asyncio.sleep(0.1)

StreamingAgentChatResponse `dataclass` #

将聊天响应流式传输给用户并写入聊天历史记录。

参数:

名称	类型	描述	默认值
`response`	`str`		`''`
`sources`	`List[ToolOutput]`	内置可变序列。如果没有给出参数，构造函数将创建一个新的空列表。如果指定了参数，则必须是一个可迭代对象。	`<dynamic>`
`chat_stream`	`Generator[ChatResponse, None, None] \| None`		`None`
`achat_stream`	`AsyncGenerator[ChatResponse, None] \| None`		`None`
`source_nodes`	`List[NodeWithScore]`	内置可变序列。如果没有提供参数，构造函数将创建一个新的空列表。如果指定了参数，则必须是一个可迭代对象。	`<dynamic>`
`unformatted_response`	`str`		`''`
`queue`	`Queue`	创建一个具有给定最大大小的队列对象。如果maxsize小于等于0，队列大小为无限。	`<dynamic>`
`aqueue`	`Queue \| None`		`None`
`is_function`	`bool \| None`		`None`
`new_item_event`	`Event \| None`		`None`
`is_function_false_event`	`Event \| None`		`None`
`is_function_not_none_thread_event`	`Event`	实现事件对象的类。事件管理一个标志位，可以通过set()方法将其设置为true，并通过clear()方法重置为false。wait()方法会阻塞直到标志位变为true。该标志位初始值为false。	`<threading.Event at 0x76056a269070: unset>`
`is_writing_to_memory`	`bool`		`True`
`exception`	`Exception \| None`		`None`
`awrite_response_to_history_task`	`Task \| None`		`None`

Source code in llama-index-core/llama_index/core/chat_engine/types.py

@dataclass
class StreamingAgentChatResponse:
    """Streaming chat response to user and writing to chat history."""

    response: str = ""
    sources: List[ToolOutput] = field(default_factory=list)
    chat_stream: Optional[ChatResponseGen] = None
    achat_stream: Optional[ChatResponseAsyncGen] = None
    source_nodes: List[NodeWithScore] = field(default_factory=list)
    unformatted_response: str = ""
    queue: Queue = field(default_factory=Queue)
    aqueue: Optional[asyncio.Queue] = None
    # flag when chat message is a function call
    is_function: Optional[bool] = None
    # flag when processing done
    is_done = False
    # signal when a new item is added to the queue
    new_item_event: Optional[asyncio.Event] = None
    # NOTE: async code uses two events rather than one since it yields
    # control when waiting for queue item
    # signal when the OpenAI functions stop executing
    is_function_false_event: Optional[asyncio.Event] = None
    # signal when an OpenAI function is being executed
    is_function_not_none_thread_event: Event = field(default_factory=Event)
    is_writing_to_memory: bool = True
    # Track if an exception occurred
    exception: Optional[Exception] = None
    awrite_response_to_history_task: Optional[asyncio.Task] = None

    def set_source_nodes(self) -> None:
        if self.sources and not self.source_nodes:
            for tool_output in self.sources:
                if isinstance(tool_output.raw_output, (Response, StreamingResponse)):
                    self.source_nodes.extend(tool_output.raw_output.source_nodes)

    def __post_init__(self) -> None:
        self.set_source_nodes()

    def __str__(self) -> str:
        if self.is_done and not self.queue.empty() and not self.is_function:
            while self.queue.queue:
                delta = self.queue.queue.popleft()
                self.unformatted_response += delta
            self.response = self.unformatted_response.strip()
        return self.response

    def _ensure_async_setup(self) -> None:
        if self.aqueue is None:
            self.aqueue = asyncio.Queue()
        if self.new_item_event is None:
            self.new_item_event = asyncio.Event()
        if self.is_function_false_event is None:
            self.is_function_false_event = asyncio.Event()

    def put_in_queue(self, delta: Optional[str]) -> None:
        self.queue.put_nowait(delta)
        self.is_function_not_none_thread_event.set()

    def aput_in_queue(self, delta: Optional[str]) -> None:
        assert self.aqueue is not None
        assert self.new_item_event is not None

        self.aqueue.put_nowait(delta)
        self.new_item_event.set()

    @dispatcher.span
    def write_response_to_history(
        self,
        memory: BaseMemory,
        on_stream_end_fn: Optional[Callable] = None,
    ) -> None:
        if self.chat_stream is None:
            raise ValueError(
                "chat_stream is None. Cannot write to history without chat_stream."
            )

        # try/except to prevent hanging on error
        dispatcher.event(StreamChatStartEvent())
        try:
            final_text = ""
            for chat in self.chat_stream:
                self.is_function = is_function(chat.message)
                if chat.delta:
                    dispatcher.event(
                        StreamChatDeltaReceivedEvent(
                            delta=chat.delta,
                        )
                    )
                    self.put_in_queue(chat.delta)
                final_text += chat.delta or ""
            if self.is_function is not None:  # if loop has gone through iteration
                # NOTE: this is to handle the special case where we consume some of the
                # chat stream, but not all of it (e.g. in react agent)
                chat.message.content = final_text.strip()  # final message
                memory.put(chat.message)
        except Exception as e:
            dispatcher.event(StreamChatErrorEvent(exception=e))
            self.exception = e

            # This act as is_done events for any consumers waiting
            self.is_function_not_none_thread_event.set()

            # force the queue reader to see the exception
            self.put_in_queue("")
            raise
        dispatcher.event(StreamChatEndEvent())

        self.is_done = True

        # This act as is_done events for any consumers waiting
        self.is_function_not_none_thread_event.set()
        if on_stream_end_fn is not None and not self.is_function:
            on_stream_end_fn()

    @dispatcher.span
    async def awrite_response_to_history(
        self,
        memory: BaseMemory,
        on_stream_end_fn: Optional[Callable] = None,
    ) -> None:
        self._ensure_async_setup()
        assert self.aqueue is not None
        assert self.is_function_false_event is not None
        assert self.new_item_event is not None

        if self.achat_stream is None:
            raise ValueError(
                "achat_stream is None. Cannot asynchronously write to "
                "history without achat_stream."
            )

        # try/except to prevent hanging on error
        dispatcher.event(StreamChatStartEvent())
        try:
            final_text = ""
            async for chat in self.achat_stream:
                self.is_function = is_function(chat.message)
                if chat.delta:
                    dispatcher.event(
                        StreamChatDeltaReceivedEvent(
                            delta=chat.delta,
                        )
                    )
                    self.aput_in_queue(chat.delta)
                final_text += chat.delta or ""
                self.new_item_event.set()
                if self.is_function is False:
                    self.is_function_false_event.set()
            if self.is_function is not None:  # if loop has gone through iteration
                # NOTE: this is to handle the special case where we consume some of the
                # chat stream, but not all of it (e.g. in react agent)
                chat.message.content = final_text.strip()  # final message
                await memory.aput(chat.message)
        except Exception as e:
            dispatcher.event(StreamChatErrorEvent(exception=e))
            self.exception = e

            # These act as is_done events for any consumers waiting
            self.is_function_false_event.set()
            self.new_item_event.set()

            # force the queue reader to see the exception
            self.aput_in_queue("")
            raise
        dispatcher.event(StreamChatEndEvent())
        self.is_done = True

        # These act as is_done events for any consumers waiting
        self.is_function_false_event.set()
        self.new_item_event.set()
        if on_stream_end_fn is not None and not self.is_function:
            if iscoroutinefunction(
                on_stream_end_fn.func
                if isinstance(on_stream_end_fn, partial)
                else on_stream_end_fn
            ):
                await on_stream_end_fn()
            else:
                on_stream_end_fn()

    @property
    def response_gen(self) -> Generator[str, None, None]:
        yielded_once = False
        if self.is_writing_to_memory:
            while not self.is_done or not self.queue.empty():
                if self.exception is not None:
                    raise self.exception

                try:
                    delta = self.queue.get(block=False)
                    self.unformatted_response += delta
                    yield delta
                    yielded_once = True
                except Empty:
                    # Queue is empty, but we're not done yet. Sleep for 0 secs to release the GIL and allow other threads to run.
                    time.sleep(0)
        else:
            if self.chat_stream is None:
                raise ValueError("chat_stream is None!")

            for chat_response in self.chat_stream:
                self.unformatted_response += chat_response.delta or ""
                yield chat_response.delta or ""
                yielded_once = True

        self.response = self.unformatted_response.strip()

        # edge case where the stream was exhausted before yielding anything
        if not yielded_once:
            yield self.response

    async def async_response_gen(self) -> AsyncGenerator[str, None]:
        try:
            yielded_once = False
            self._ensure_async_setup()
            assert self.aqueue is not None

            if self.is_writing_to_memory:
                while True:
                    if not self.aqueue.empty() or not self.is_done:
                        if self.exception is not None:
                            raise self.exception

                        try:
                            delta = await asyncio.wait_for(
                                self.aqueue.get(), timeout=0.1
                            )
                        except asyncio.TimeoutError:
                            if self.is_done:
                                break
                            continue
                        if delta is not None:
                            self.unformatted_response += delta
                            yield delta
                            yielded_once = True
                    else:
                        break
            else:
                if self.achat_stream is None:
                    raise ValueError("achat_stream is None!")

                async for chat_response in self.achat_stream:
                    self.unformatted_response += chat_response.delta or ""
                    yield chat_response.delta or ""
                    yielded_once = True
            self.response = self.unformatted_response.strip()

            # edge case where the stream was exhausted before yielding anything
            if not yielded_once:
                yield self.response
        finally:
            if self.awrite_response_to_history_task:
                # Make sure that the background task ran to completion, retrieve any exceptions
                await self.awrite_response_to_history_task
                self.awrite_response_to_history_task = (
                    None  # No need to keep the reference to the finished task
                )

    def print_response_stream(self) -> None:
        for token in self.response_gen:
            print(token, end="", flush=True)

    async def aprint_response_stream(self) -> None:
        async for token in self.async_response_gen():
            print(token, end="", flush=True)

BaseChatEngine #

基类: DispatcherSpanMixin, ABC

基础聊天引擎。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

class BaseChatEngine(DispatcherSpanMixin, ABC):
    """Base Chat Engine."""

    @abstractmethod
    def reset(self) -> None:
        """Reset conversation state."""

    @abstractmethod
    def chat(
        self, message: str, chat_history: Optional[List[ChatMessage]] = None
    ) -> AGENT_CHAT_RESPONSE_TYPE:
        """Main chat interface."""

    @abstractmethod
    def stream_chat(
        self, message: str, chat_history: Optional[List[ChatMessage]] = None
    ) -> StreamingAgentChatResponse:
        """Stream chat interface."""

    @abstractmethod
    async def achat(
        self, message: str, chat_history: Optional[List[ChatMessage]] = None
    ) -> AGENT_CHAT_RESPONSE_TYPE:
        """Async version of main chat interface."""

    @abstractmethod
    async def astream_chat(
        self, message: str, chat_history: Optional[List[ChatMessage]] = None
    ) -> StreamingAgentChatResponse:
        """Async version of main chat interface."""

    def chat_repl(self) -> None:
        """Enter interactive chat REPL."""
        print("===== Entering Chat REPL =====")
        print('Type "exit" to exit.\n')
        self.reset()
        message = input("Human: ")
        while message != "exit":
            response = self.chat(message)
            print(f"Assistant: {response}\n")
            message = input("Human: ")

    def streaming_chat_repl(self) -> None:
        """Enter interactive chat REPL with streaming responses."""
        print("===== Entering Chat REPL =====")
        print('Type "exit" to exit.\n')
        self.reset()
        message = input("Human: ")
        while message != "exit":
            response = self.stream_chat(message)
            print("Assistant: ", end="", flush=True)
            response.print_response_stream()
            print("\n")
            message = input("Human: ")

    @property
    @abstractmethod
    def chat_history(self) -> List[ChatMessage]:
        pass

重置 `abstractmethod` #

reset() -> None

重置对话状态。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

@abstractmethod
def reset(self) -> None:
    """Reset conversation state."""

聊天 `abstractmethod` #

chat(message: str, chat_history: Optional[List[ChatMessage]] = None) -> AGENT_CHAT_RESPONSE_TYPE

主聊天界面。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

@abstractmethod
def chat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> AGENT_CHAT_RESPONSE_TYPE:
    """Main chat interface."""

stream_chat `abstractmethod` #

stream_chat(message: str, chat_history: Optional[List[ChatMessage]] = None) -> StreamingAgentChatResponse

流式聊天界面。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

@abstractmethod
def stream_chat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> StreamingAgentChatResponse:
    """Stream chat interface."""

聊天 `abstractmethod` `async` #

achat(message: str, chat_history: Optional[List[ChatMessage]] = None) -> AGENT_CHAT_RESPONSE_TYPE

主聊天界面的异步版本。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

@abstractmethod
async def achat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> AGENT_CHAT_RESPONSE_TYPE:
    """Async version of main chat interface."""

astream_chat `abstractmethod` `async` #

astream_chat(message: str, chat_history: Optional[List[ChatMessage]] = None) -> StreamingAgentChatResponse

主聊天界面的异步版本。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

@abstractmethod
async def astream_chat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> StreamingAgentChatResponse:
    """Async version of main chat interface."""

chat_repl #

chat_repl() -> None

进入交互式聊天REPL环境。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

def chat_repl(self) -> None:
    """Enter interactive chat REPL."""
    print("===== Entering Chat REPL =====")
    print('Type "exit" to exit.\n')
    self.reset()
    message = input("Human: ")
    while message != "exit":
        response = self.chat(message)
        print(f"Assistant: {response}\n")
        message = input("Human: ")

streaming_chat_repl #

streaming_chat_repl() -> None

进入交互式聊天REPL，支持流式响应。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

def streaming_chat_repl(self) -> None:
    """Enter interactive chat REPL with streaming responses."""
    print("===== Entering Chat REPL =====")
    print('Type "exit" to exit.\n')
    self.reset()
    message = input("Human: ")
    while message != "exit":
        response = self.stream_chat(message)
        print("Assistant: ", end="", flush=True)
        response.print_response_stream()
        print("\n")
        message = input("Human: ")

聊天模式 #

基础类: str, Enum

聊天引擎模式。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

class ChatMode(str, Enum):
    """Chat Engine Modes."""

    SIMPLE = "simple"
    """Corresponds to `SimpleChatEngine`.

    Chat with LLM, without making use of a knowledge base.
    """

    CONDENSE_QUESTION = "condense_question"
    """Corresponds to `CondenseQuestionChatEngine`.

    First generate a standalone question from conversation context and last message,
    then query the query engine for a response.
    """

    CONTEXT = "context"
    """Corresponds to `ContextChatEngine`.

    First retrieve text from the index using the user's message, then use the context
    in the system prompt to generate a response.
    """

    CONDENSE_PLUS_CONTEXT = "condense_plus_context"
    """Corresponds to `CondensePlusContextChatEngine`.

    First condense a conversation and latest user message to a standalone question.
    Then build a context for the standalone question from a retriever,
    Then pass the context along with prompt and user message to LLM to generate a response.
    """

    REACT = "react"
    """Corresponds to `ReActAgent`.

    Use a ReAct agent loop with query engine tools.
    """

    OPENAI = "openai"
    """Corresponds to `OpenAIAgent`.

    Use an OpenAI function calling agent loop.

    NOTE: only works with OpenAI models that support function calling API.
    """

    BEST = "best"
    """Select the best chat engine based on the current LLM.

    Corresponds to `OpenAIAgent` if using an OpenAI model that supports
    function calling API, otherwise, corresponds to `ReActAgent`.
    """

简单 `class-attribute` `instance-attribute` #

SIMPLE = 'simple'

对应 SimpleChatEngine。

与LLM聊天，无需使用知识库。

问题浓缩 `class-attribute` `instance-attribute` #

CONDENSE_QUESTION = 'condense_question'

对应 CondenseQuestionChatEngine。

首先根据对话上下文和最后一条消息生成一个独立的问题，然后向查询引擎请求响应。

上下文 `class-attribute` `instance-attribute` #

CONTEXT = 'context'

对应 ContextChatEngine。

首先使用用户的消息从索引中检索文本，然后在系统提示中使用上下文生成响应。

CONDENSE_PLUS_CONTEXT `class-attribute` `instance-attribute` #

CONDENSE_PLUS_CONTEXT = 'condense_plus_context'

对应 CondensePlusContextChatEngine。

首先将对话和最新的用户消息浓缩为一个独立的问题。然后通过检索器为该独立问题构建上下文，最后将上下文连同提示和用户消息传递给LLM生成响应。

REACT `class-attribute` `instance-attribute` #

REACT = 'react'

对应 ReActAgent。

使用带有查询引擎工具的ReAct智能体循环。

OPENAI `class-attribute` `instance-attribute` #

OPENAI = 'openai'

对应 OpenAIAgent。

使用OpenAI函数调用智能体循环。

注意：仅适用于支持函数调用API的OpenAI模型。

最佳 `class-attribute` `instance-attribute` #

BEST = 'best'

根据当前LLM选择最佳的聊天引擎。

如果使用支持函数调用API的OpenAI模型，则对应OpenAIAgent；否则对应ReActAgent。

is_function #

is_function(message: ChatMessage) -> bool

用于处理来自OpenAI模型的ChatMessage响应的实用工具。

Source code in llama-index-core/llama_index/core/chat_engine/types.py

def is_function(message: ChatMessage) -> bool:
    """Utility for ChatMessage responses from OpenAI models."""
    return (
        "tool_calls" in message.additional_kwargs
        and len(message.additional_kwargs["tool_calls"]) > 0
    )

索引

聊天响应模式 #

AgentChatResponse dataclass #

response_gen property #

async_response_gen async #

StreamingAgentChatResponse dataclass #

BaseChatEngine #

重置 abstractmethod #

聊天 abstractmethod #

stream_chat abstractmethod #

聊天 abstractmethod async #

astream_chat abstractmethod async #

chat_repl #

streaming_chat_repl #

聊天模式 #

简单 class-attribute instance-attribute #

问题浓缩 class-attribute instance-attribute #

上下文 class-attribute instance-attribute #

CONDENSE_PLUS_CONTEXT class-attribute instance-attribute #

REACT class-attribute instance-attribute #

OPENAI class-attribute instance-attribute #

最佳 class-attribute instance-attribute #

is_function #

AgentChatResponse `dataclass` #

response_gen `property` #

async_response_gen `async` #

StreamingAgentChatResponse `dataclass` #

重置 `abstractmethod` #

聊天 `abstractmethod` #

stream_chat `abstractmethod` #

聊天 `abstractmethod` `async` #

astream_chat `abstractmethod` `async` #

简单 `class-attribute` `instance-attribute` #

问题浓缩 `class-attribute` `instance-attribute` #

上下文 `class-attribute` `instance-attribute` #

CONDENSE_PLUS_CONTEXT `class-attribute` `instance-attribute` #

REACT `class-attribute` `instance-attribute` #

OPENAI `class-attribute` `instance-attribute` #

最佳 `class-attribute` `instance-attribute` #