autogen_ext.models.llama_cpp#

class LlamaCppChatCompletionClient(model_info: ModelInfo | 无 = None, **kwargs: Unpack)[源代码]#

基础类：ChatCompletionClient

用于LlamaCpp模型的聊天完成客户端。要使用此客户端，你必须安装llama-cpp扩展：

pip install "autogen-ext[llama-cpp]"

此客户端允许您与LlamaCpp模型进行交互，您可以通过指定本地模型路径或从Hugging Face Hub下载模型来实现。

Parameters:

model_path (可选, str) – LlamaCpp 模型文件的路径。如果未提供 repo_id 和 filename，则必须提供此参数。
repo_id (可选, str) – Hugging Face Hub 仓库 ID。如果未提供 model_path，则必须提供。
filename (可选的, str) – Hugging Face Hub 仓库中模型的文件名。如果未提供 model_path，则需要此参数。
n_gpu_layers (optional, int) – 放置在GPU上的层数。
n_ctx (可选, int) – 上下文大小。
n_batch (可选, int) – 批量大小。
verbose (可选, bool) – 是否打印详细输出。
model_info (可选, ModelInfo) – 模型的能力。默认为一个将function_calling设置为True的ModelInfo实例。
**kwargs – 传递给Llama类的额外参数。

示例

以下代码片段展示了如何使用客户端与本地模型文件：

import asyncio

from autogen_core.models import UserMessage
from autogen_ext.models.llama_cpp import LlamaCppChatCompletionClient


async def main():
    llama_client = LlamaCppChatCompletionClient(model_path="/path/to/your/model.gguf")
    result = await llama_client.create([UserMessage(content="What is the capital of France?", source="user")])
    print(result)


asyncio.run(main())

以下代码片段展示了如何使用来自 Hugging Face Hub 的模型与客户端：

import asyncio

from autogen_core.models import UserMessage
from autogen_ext.models.llama_cpp import LlamaCppChatCompletionClient


async def main():
    llama_client = LlamaCppChatCompletionClient(
        repo_id="unsloth/phi-4-GGUF", filename="phi-4-Q2_K_L.gguf", n_gpu_layers=-1, seed=1337, n_ctx=5000
    )
    result = await llama_client.create([UserMessage(content="What is the capital of France?", source="user")])
    print(result)


asyncio.run(main())

async create(messages: Sequence[已注解[系统消息 | UserMessage | AssistantMessage | FunctionExecutionResultMessage, FieldInfo(annotation=NoneType, required=True, discriminator='type')]], *, tools: Sequence[工具 | 工具模式] = [], json_output: bool | 无 = None, extra_create_args: 映射[str, 任何] = {}, cancellation_token: CancellationToken | 无 = None) → CreateResult[源代码]#

async create_stream(messages: Sequence[已注解[系统消息 | UserMessage | AssistantMessage | FunctionExecutionResultMessage, FieldInfo(annotation=NoneType, required=True, discriminator='type')]], *, tools: Sequence[工具 | 工具模式] = [], json_output: bool | 无 = None, extra_create_args: 映射[str, 任何] = {}, cancellation_token: CancellationToken | 无 = None) → AsyncGenerator[str | CreateResult, 无][源代码]#

actual_usage() → RequestUsage[源代码]#

property capabilities: ModelInfo#

count_tokens(messages: Sequence[系统消息 | UserMessage | AssistantMessage | FunctionExecutionResultMessage], **kwargs: 任何) → int[源代码]#

property model_info: ModelInfo#

remaining_tokens(messages: Sequence[系统消息 | UserMessage | AssistantMessage | FunctionExecutionResultMessage], **kwargs: 任何) → int[源代码]#

total_usage() → RequestUsage[源代码]#

async close() → 无[源代码]#: 关闭LlamaCpp客户端。