跳到主要内容

Ollama

Open In Colab Open on GitHub

Ollama 是一个本地推理引擎,使您能够在您的环境中运行开放式权重的大型语言模型。它原生支持大量模型,例如Google的Gemma、Meta的Llama 2/3/3.1、Microsoft的Phi 3、Mistral.AI的Mistral/Mixtral,以及Cohere的Command R模型。

注意:以前,要在AutoGen中使用Ollama,您需要LiteLLM。现在可以直接使用,并支持工具调用。

功能

使用此Ollama客户端类时,消息会根据Ollama API的特定要求进行定制,包括消息角色序列、支持函数/工具调用以及令牌使用。

安装Ollama

对于Mac和Windows,下载Ollama

对于 Linux:

curl -fsSL https://ollama.com/install.sh | sh

为Ollama下载模型

Ollama 有一个模型库可供选择,查看它们 这里

在使用模型之前,您需要先下载它(使用库中的模型名称):

ollama pull llama3.1

查看您已下载并可使用的模型:

ollama list

开始使用 AutoGen 和 Ollama

在安装AutoGen时,你需要安装带有Ollama库的pyautogen包。

pip install pyautogen[ollama]

请参见下面的示例OAI_CONFIG_LIST,它展示了如何通过将api_type指定为ollama来使用Ollama客户端类。

[
{
"model": "llama3.1",
"api_type": "ollama"
},
{
"model": "llama3.1:8b-instruct-q6_K",
"api_type": "ollama"
},
{
"model": "mistral-nemo",
"api_type": "ollama"
}
]

如果需要为您的Ollama安装指定URL,请按照下面的示例使用配置中的client_host键:

[
{
"model": "llama3.1",
"api_type": "ollama",
"client_host": "http://192.168.0.1:11434"
}
]

API参数

以下Ollama参数可以添加到您的配置中。有关更多信息,请参见此链接

  • num_predict (integer): -1 表示无限, -2 表示填充上下文, 128 是默认值
  • repeat_penalty (float)
  • seed (整数)
  • stream (布尔值)
  • temperature (float)
  • top_k (int)
  • top_p (float)

示例:

[
{
"model": "llama3.1:instruct",
"api_type": "ollama",
"num_predict": -1,
"repeat_penalty": 1.1,
"seed": 42,
"stream": False,
"temperature": 1,
"top_k": 50,
"top_p": 0.8
}
]

双代理编码示例

在这个示例中,我们运行了一个双代理聊天,其中包含一个AssistantAgent(主要是一个编码代理)来生成代码,用于计算1到10,000之间的质数数量,然后执行该代码。

我们将使用Meta的Llama 3.1模型,该模型适合编码。

在这个例子中,我们将使用client_host指定Ollama安装的URL。

config_list = [
{
# Let's choose the Meta's Llama 3.1 model (model names must match Ollama exactly)
"model": "llama3.1:8b",
# We specify the API Type as 'ollama' so it uses the Ollama client class
"api_type": "ollama",
"stream": False,
"client_host": "http://192.168.0.1:11434",
}
]

重要的是,我们已经调整了系统消息,以便模型不会返回终止关键字,我们已将其更改为 FINISH,与代码块一起。

from pathlib import Path

from autogen import AssistantAgent, UserProxyAgent
from autogen.coding import LocalCommandLineCodeExecutor

# Setting up the code executor
workdir = Path("coding")
workdir.mkdir(exist_ok=True)
code_executor = LocalCommandLineCodeExecutor(work_dir=workdir)

# Setting up the agents

# The UserProxyAgent will execute the code that the AssistantAgent provides
user_proxy_agent = UserProxyAgent(
name="User",
code_execution_config={"executor": code_executor},
is_termination_msg=lambda msg: "FINISH" in msg.get("content"),
)

system_message = """You are a helpful AI assistant who writes code and the user
executes it. Solve tasks using your python coding skills.
In the following cases, suggest python code (in a python coding block) for the
user to execute. When using code, you must indicate the script type in the code block.
You only need to create one working sample.
Do not suggest incomplete code which requires users to modify it.
Don't use a code block if it's not intended to be executed by the user. Don't
include multiple code blocks in one response. Do not ask users to copy and
paste the result. Instead, use 'print' function for the output when relevant.
Check the execution result returned by the user.

If the result indicates there is an error, fix the error.

IMPORTANT: If it has executed successfully, ONLY output 'FINISH'."""

# The AssistantAgent, using the Ollama config, will take the coding request and return code
assistant_agent = AssistantAgent(
name="Ollama Assistant",
system_message=system_message,
llm_config={"config_list": config_list},
)
/usr/local/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm

我们现在可以开始聊天了。

# Start the chat, with the UserProxyAgent asking the AssistantAgent the message
chat_result = user_proxy_agent.initiate_chat(
assistant_agent,
message="Provide code to count the number of prime numbers from 1 to 10000.",
)
User (to Ollama Assistant):

Provide code to count the number of prime numbers from 1 to 10000.

--------------------------------------------------------------------------------
Ollama Assistant (to User):

```python
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True

count = sum(is_prime(i) for i in range(1, 10001))
print(count)
```

Please execute this code. I will wait for the result.

--------------------------------------------------------------------------------

>>>>>>>> NO HUMAN INPUT RECEIVED.

>>>>>>>> USING AUTO REPLY...

>>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...
User (to Ollama Assistant):

exitcode: 0 (execution succeeded)
Code output: 1229


--------------------------------------------------------------------------------
Ollama Assistant (to User):

FINISH

--------------------------------------------------------------------------------

>>>>>>>> NO HUMAN INPUT RECEIVED.

工具调用 - 原生 vs 手动

Ollama 支持原生工具调用(Ollama v0.3.1 库及以上版本)。如果您通过 pip install pyautogen[ollama] 安装 AutoGen,您将能够使用原生工具调用。

配置中的参数 native_tool_calls 允许您指定是否要使用 Ollama 的原生工具调用(默认)或手动工具调用。

[
{
"model": "llama3.1",
"api_type": "ollama",
"client_host": "http://192.168.0.1:11434",
"native_tool_calls": True # Use Ollama's native tool calling, False for manual
}
]

原生工具调用仅适用于特定模型,如果尝试在不支持的模型上使用它,将会抛出异常。

手动工具调用允许您与任何Ollama模型一起使用工具调用。它将引导工具调用的消息整合到提示中,引导LLM选择工具并评估工具结果。如预期的那样,遵循指令并返回格式化JSON的能力高度依赖于模型。

你可以通过将这些参数添加到配置中来定制手动工具调用消息:

  • manual_tool_call_instruction
  • manual_tool_call_step1
  • manual_tool_call_step2

要使用手动工具调用,请将 native_tool_calls 设置为 False

减少重复工具调用

通过将工具融入对话中,LLMs通常会不断建议调用这些工具,即使它们已经被调用并返回了结果。这可能导致工具调用的无限循环。

为了消除LLM推荐工具调用的可能性,可以使用一个名为hide_tools的额外参数来指定何时将工具隐藏起来,不让LLM看到。该参数的字符串值为:

  • ‘never’: 工具永远不会被隐藏
  • ‘if_all_run’:如果所有工具都已被调用,则隐藏工具
  • ‘if_any_run’: 如果任何工具被调用,则隐藏工具

这可以用于本地或手动工具调用,下面显示了一个配置示例。

[
{
"model": "llama3.1",
"api_type": "ollama",
"client_host": "http://192.168.0.1:11434",
"native_tool_calls": True,
"hide_tools": "if_any_run" # Hide tools once any tool has been called
}
]

工具调用示例

在这个例子中,我们不会直接编写代码,而是让一个代理通过调用多个工具来协助进行旅行规划。

再次,我们将使用Meta的通用Llama 3.1。

将使用原生的Ollama工具调用,并且我们将利用hide_tools参数在所有工具被调用后隐藏它们。

import json
from typing import Literal

from typing_extensions import Annotated

import autogen

config_list = [
{
# Let's choose the Meta's Llama 3.1 model (model names must match Ollama exactly)
"model": "llama3.1:8b",
"api_type": "ollama",
"stream": False,
"client_host": "http://192.168.0.1:11434",
"hide_tools": "if_any_run",
}
]

我们将创建我们的代理。重要的是,我们使用原生的Ollama工具调用,并在system_message中添加JSON以帮助引导它,这样数字字段就不会被引号包围(变成字符串)。

# Create the agent for tool calling
chatbot = autogen.AssistantAgent(
name="chatbot",
system_message="""For currency exchange and weather forecasting tasks,
only use the functions you have been provided with.
Example of the return JSON is:
{
"parameter_1_name": 100.00,
"parameter_2_name": "ABC",
"parameter_3_name": "DEF",
}.
Another example of the return JSON is:
{
"parameter_1_name": "GHI",
"parameter_2_name": "ABC",
"parameter_3_name": "DEF",
"parameter_4_name": 123.00,
}.
Output 'HAVE FUN!' when an answer has been provided.""",
llm_config={"config_list": config_list},
)

# Note that we have changed the termination string to be "HAVE FUN!"
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
is_termination_msg=lambda x: x.get("content", "") and "HAVE FUN!" in x.get("content", ""),
human_input_mode="NEVER",
max_consecutive_auto_reply=1,
)

创建并注册我们的函数(工具)。有关更多信息,请参阅工具使用教程章节

# Currency Exchange function

CurrencySymbol = Literal["USD", "EUR"]

# Define our function that we expect to call


def exchange_rate(base_currency: CurrencySymbol, quote_currency: CurrencySymbol) -> float:
if base_currency == quote_currency:
return 1.0
elif base_currency == "USD" and quote_currency == "EUR":
return 1 / 1.1
elif base_currency == "EUR" and quote_currency == "USD":
return 1.1
else:
raise ValueError(f"Unknown currencies {base_currency}, {quote_currency}")


# Register the function with the agent


@user_proxy.register_for_execution()
@chatbot.register_for_llm(description="Currency exchange calculator.")
def currency_calculator(
base_amount: Annotated[
float,
"Amount of currency in base_currency. Type is float, not string, return value should be a number only, e.g. 987.65.",
],
base_currency: Annotated[CurrencySymbol, "Base currency"] = "USD",
quote_currency: Annotated[CurrencySymbol, "Quote currency"] = "EUR",
) -> str:
quote_amount = exchange_rate(base_currency, quote_currency) * base_amount
return f"{format(quote_amount, '.2f')} {quote_currency}"


# Weather function


# Example function to make available to model
def get_current_weather(location, unit="fahrenheit"):
"""Get the weather for some location"""
if "chicago" in location.lower():
return json.dumps({"location": "Chicago", "temperature": "13", "unit": unit})
elif "san francisco" in location.lower():
return json.dumps({"location": "San Francisco", "temperature": "55", "unit": unit})
elif "new york" in location.lower():
return json.dumps({"location": "New York", "temperature": "11", "unit": unit})
else:
return json.dumps({"location": location, "temperature": "unknown"})


# Register the function with the agent


@user_proxy.register_for_execution()
@chatbot.register_for_llm(description="Weather forecast for US cities.")
def weather_forecast(
location: Annotated[str, "City name"],
) -> str:
weather_details = get_current_weather(location=location)
weather = json.loads(weather_details)
return f"{weather['location']} will be {weather['temperature']} degrees {weather['unit']}"

然后运行它!

# start the conversation
res = user_proxy.initiate_chat(
chatbot,
message="What's the weather in New York and can you tell me how much is 123.45 EUR in USD so I can spend it on my holiday? Throw a few holiday tips in as well.",
summary_method="reflection_with_llm",
)

print(f"LLM SUMMARY: {res.summary['content']}")
user_proxy (to chatbot):

What's the weather in New York and can you tell me how much is 123.45 EUR in USD so I can spend it on my holiday? Throw a few holiday tips in as well.

--------------------------------------------------------------------------------
chatbot (to user_proxy):


***** Suggested tool call (ollama_func_4506): weather_forecast *****
Arguments:
{"location": "New York"}
********************************************************************
***** Suggested tool call (ollama_func_4507): currency_calculator *****
Arguments:
{"base_amount": 123.45, "base_currency": "EUR", "quote_currency": "USD"}
***********************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION weather_forecast...

>>>>>>>> EXECUTING FUNCTION currency_calculator...
user_proxy (to chatbot):

user_proxy (to chatbot):

***** Response from calling tool (ollama_func_4506) *****
New York will be 11 degrees fahrenheit
*********************************************************

--------------------------------------------------------------------------------
user_proxy (to chatbot):

***** Response from calling tool (ollama_func_4507) *****
135.80 USD
*********************************************************

--------------------------------------------------------------------------------
chatbot (to user_proxy):

Based on the results, it seems that:

* The weather forecast for New York is expected to be around 11 degrees Fahrenheit.
* The exchange rate for EUR to USD is currently 1 EUR = 1.3580 USD, so 123.45 EUR is equivalent to approximately 135.80 USD.

As a bonus, here are some holiday tips in New York:

* Be sure to try a classic New York-style hot dog from a street cart or a diner.
* Explore the iconic Central Park and take a stroll through the High Line for some great views of the city.
* Catch a Broadway show or a concert at one of the many world-class venues in the city.

And... HAVE FUN!

--------------------------------------------------------------------------------
LLM SUMMARY: The weather forecast for New York is expected to be around 11 degrees Fahrenheit.
123.45 EUR is equivalent to approximately 135.80 USD.
Try a classic New York-style hot dog, explore Central Park and the High Line, and catch a Broadway show or concert during your visit.

太好了,我们看到Llama 3.1已经帮助选择了正确的函数及其参数,并为我们进行了总结。