Cerebras

Cerebras 已经开发出了世界上最大且最快的人工智能处理器，即晶圆级引擎-3（WSE-3）。值得注意的是，CS-3 系统能够以极快的速度运行像 Llama-3.1-8B 和 Llama-3.1-70B 这样的大型语言模型，使其成为处理高要求人工智能工作负载的理想平台。

虽然从技术上讲可以通过更新base_url来使AutoGen适配Cerebras的API，但这种方法可能无法完全应对参数支持中的微小差异。使用该库还可以根据实际令牌使用情况跟踪API成本。

有关 Cerebras Cloud 的更多信息，请访问 cloud.cerebras.ai。其 API 参考文档可在 inference-docs.cerebras.ai 获取。

要求

要使用Cerebras与AutoGen，请安装autogen-agentchat[cerebras]包。

!pip install autogen-agentchat["cerebras"]~=0.2

入门指南

Cerebras 提供了多种模型供使用。请查看模型列表。

请参阅下面的示例 OAI_CONFIG_LIST，通过将 api_type 指定为 cerebras 来展示如何使用 Cerebras 客户端类。

[
    {
        "model": "llama3.1-8b",
        "api_key": "your Cerebras API Key goes here",
        "api_type": "cerebras"
    },
    {
        "model": "llama3.1-70b",
        "api_key": "your Cerebras API Key goes here",
        "api_type": "cerebras"
    }
]

凭证

从cloud.cerebras.ai获取API密钥，并将其添加到您的环境变量中：

export CEREBRAS_API_KEY="your-api-key-here"

API参数

以下参数可以添加到您的Cerebras API配置中。有关这些参数及其默认值的更多信息，请参见此链接。

max_tokens (null, 整数 >= 0)
seed（数字）
流式传输（真或假）
temperature (数字 0..1.5)
top_p（数字）

示例：

[
    {
        "model": "llama3.1-70b",
        "api_key": "your Cerebras API Key goes here",
        "api_type": "cerebras"
        "max_tokens": 10000,
        "seed": 1234,
        "stream" True,
        "temperature": 0.5,
        "top_p": 0.2, # Note: It is recommended to set temperature or top_p but not both.
    }
]

双代理编码示例

在这个示例中，我们运行了一个双代理聊天，其中包含一个AssistantAgent（主要是一个编码代理）来生成代码，用于计算1到10,000之间的质数数量，然后执行该代码。

我们将使用Meta的LLama-3.1-70B模型，该模型适用于编码。

import os

from autogen.oai.cerebras import CerebrasClient, calculate_cerebras_cost

config_list = [{"model": "llama3.1-70b", "api_key": os.environ.get("CEREBRAS_API_KEY"), "api_type": "cerebras"}]

重要的是，我们已经调整了系统消息，以便模型不会返回终止关键字，我们已将其更改为 FINISH，与代码块一起。

from pathlib import Path

from autogen import AssistantAgent, UserProxyAgent
from autogen.coding import LocalCommandLineCodeExecutor

# Setting up the code executor
workdir = Path("coding")
workdir.mkdir(exist_ok=True)
code_executor = LocalCommandLineCodeExecutor(work_dir=workdir)

# Setting up the agents

# The UserProxyAgent will execute the code that the AssistantAgent provides
user_proxy_agent = UserProxyAgent(
    name="User",
    code_execution_config={"executor": code_executor},
    is_termination_msg=lambda msg: "FINISH" in msg.get("content"),
)

system_message = """You are a helpful AI assistant who writes code and the user executes it.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) for the user to execute.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
IMPORTANT: Wait for the user to execute your code and then you can reply with the word "FINISH". DO NOT OUTPUT "FINISH" after your code block."""

# The AssistantAgent, using Llama-3.1-70B on Cerebras Inference, will take the coding request and return code
assistant_agent = AssistantAgent(
    name="Cerebras Assistant",
    system_message=system_message,
    llm_config={"config_list": config_list},
)

# Start the chat, with the UserProxyAgent asking the AssistantAgent the message
chat_result = user_proxy_agent.initiate_chat(
    assistant_agent,
    message="Provide code to count the number of prime numbers from 1 to 10000.",
)

User (to Cerebras Assistant):

Provide code to count the number of prime numbers from 1 to 10000.

--------------------------------------------------------------------------------
Cerebras Assistant (to User):

To count the number of prime numbers from 1 to 10000, we will utilize a simple algorithm that checks each number in the range to see if it is prime. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself.

Here's how we can do it using a Python script:

```python
def count_primes(n):
    primes = 0
    for possiblePrime in range(2, n + 1):
        # Assume number is prime until shown it is not. 
        isPrime = True
        for num in range(2, int(possiblePrime ** 0.5) + 1):
            if possiblePrime % num == 0:
                isPrime = False
                break
        if isPrime:
            primes += 1
    return primes

# Counting prime numbers from 1 to 10000
count = count_primes(10000)
print(count)
```

Please execute this code. I will respond with "FINISH" after you provide the result.

--------------------------------------------------------------------------------
Replying as User. Provide feedback to Cerebras Assistant. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  

>>>>>>>> NO HUMAN INPUT RECEIVED.

工具调用示例

在这个例子中，我们不编写代码，而是展示 Meta 的 Llama-3.1-70B 模型如何执行并行工具调用，即它建议同时调用多个工具。

我们将使用一个简单的旅行助手程序，其中包含了天气和货币转换的几个工具。

我们首先导入库并设置我们的配置以使用 Llama-3.1-70B 和 cerebras 客户端类。

import json
import os
from typing import Literal

from typing_extensions import Annotated

import autogen

config_list = [
    {
        "model": "llama3.1-70b",
        "api_key": os.environ.get("CEREBRAS_API_KEY"),
        "api_type": "cerebras",
    }
]

创建我们的两个代理。

# Create the agent for tool calling
chatbot = autogen.AssistantAgent(
    name="chatbot",
    system_message="""
        For currency exchange and weather forecasting tasks,
        only use the functions you have been provided with.
        When you summarize, make sure you've considered ALL previous instructions.
        Output 'HAVE FUN!' when an answer has been provided.
    """,
    llm_config={"config_list": config_list},
)

# Note that we have changed the termination string to be "HAVE FUN!"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    is_termination_msg=lambda x: x.get("content", "") and "HAVE FUN!" in x.get("content", ""),
    human_input_mode="NEVER",
    max_consecutive_auto_reply=1,
)

创建这两个函数，并对它们进行注释，以便这些描述可以传递给LLM。

我们通过使用register_for_execution函数将用户代理与之关联，以便它可以执行该函数，并使用register_for_llm函数将聊天机器人（由LLM驱动）与之关联，以便它可以将函数定义传递给LLM。

# Currency Exchange function

CurrencySymbol = Literal["USD", "EUR"]

# Define our function that we expect to call


def exchange_rate(base_currency: CurrencySymbol, quote_currency: CurrencySymbol) -> float:
    if base_currency == quote_currency:
        return 1.0
    elif base_currency == "USD" and quote_currency == "EUR":
        return 1 / 1.1
    elif base_currency == "EUR" and quote_currency == "USD":
        return 1.1
    else:
        raise ValueError(f"Unknown currencies {base_currency}, {quote_currency}")


# Register the function with the agent


@user_proxy.register_for_execution()
@chatbot.register_for_llm(description="Currency exchange calculator.")
def currency_calculator(
    base_amount: Annotated[float, "Amount of currency in base_currency"],
    base_currency: Annotated[CurrencySymbol, "Base currency"] = "USD",
    quote_currency: Annotated[CurrencySymbol, "Quote currency"] = "EUR",
) -> str:
    quote_amount = exchange_rate(base_currency, quote_currency) * base_amount
    return f"{format(quote_amount, '.2f')} {quote_currency}"


# Weather function


# Example function to make available to model
def get_current_weather(location, unit="fahrenheit"):
    """Get the weather for some location"""
    if "chicago" in location.lower():
        return json.dumps({"location": "Chicago", "temperature": "13", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "55", "unit": unit})
    elif "new york" in location.lower():
        return json.dumps({"location": "New York", "temperature": "11", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})


# Register the function with the agent


@user_proxy.register_for_execution()
@chatbot.register_for_llm(description="Weather forecast for US cities.")
def weather_forecast(
    location: Annotated[str, "City name"],
) -> str:
    weather_details = get_current_weather(location=location)
    weather = json.loads(weather_details)
    return f"{weather['location']} will be {weather['temperature']} degrees {weather['unit']}"

我们传递客户的消息并运行聊天。

最后，我们要求LLM（大语言模型）总结聊天内容并打印出来。

import time

start_time = time.time()

# start the conversation
res = user_proxy.initiate_chat(
    chatbot,
    message="What's the weather in New York and can you tell me how much is 123.45 EUR in USD so I can spend it on my holiday? Throw a few holiday tips in as well.",
    summary_method="reflection_with_llm",
)

end_time = time.time()

print(f"LLM SUMMARY: {res.summary['content']}\n\nDuration: {(end_time - start_time) * 1000}ms")

user_proxy (to chatbot):

What's the weather in New York and can you tell me how much is 123.45 EUR in USD so I can spend it on my holiday? Throw a few holiday tips in as well.

--------------------------------------------------------------------------------
chatbot (to user_proxy):

***** Suggested tool call (210f6ac6d): weather_forecast *****
Arguments: 
{"location": "New York"}
*************************************************************
***** Suggested tool call (3c00ac7d5): currency_calculator *****
Arguments: 
{"base_amount": 123.45, "base_currency": "EUR", "quote_currency": "USD"}
****************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION weather_forecast...

>>>>>>>> EXECUTING FUNCTION currency_calculator...
user_proxy (to chatbot):

user_proxy (to chatbot):

***** Response from calling tool (210f6ac6d) *****
New York will be 11 degrees fahrenheit
**************************************************

--------------------------------------------------------------------------------
user_proxy (to chatbot):

***** Response from calling tool (3c00ac7d5) *****
135.80 USD
**************************************************

--------------------------------------------------------------------------------
chatbot (to user_proxy):

New York will be 11 degrees fahrenheit.
123.45 EUR is equivalent to 135.80 USD.
 
For a great holiday, explore the Statue of Liberty, take a walk through Central Park, or visit one of the many world-class museums. Also, you'll find great food ranging from bagels to fine dining experiences. HAVE FUN!

--------------------------------------------------------------------------------
LLM SUMMARY: New York will be 11 degrees fahrenheit. 123.45 EUR is equivalent to 135.80 USD. Explore the Statue of Liberty, walk through Central Park, or visit one of the many world-class museums for a great holiday in New York.

Duration: 73.97937774658203ms

我们可以看到，Cerebras Wafer-Scale Engine-3 (WSE-3) 在74毫秒内完成了查询——比眨眼还快！

Cerebras

要求​

入门指南​

凭证​

API参数​

双代理编码示例​

工具调用示例​

要求

入门指南

凭证

API参数

双代理编码示例

工具调用示例