使用模式运行Outlines

Modal 是一个无服务器平台，可以让你轻松地在云端运行代码，包括GPU。对于那些在家没有强大GPU的我们来说，它非常方便，可以快速而轻松地配置、配置和协调云基础设施。

在本指南中，我们将向您展示如何使用Modal在云中运行使用Outlines编写的程序，利用GPU。

需求

我们建议在虚拟环境中安装 modal 和 outlines 。您可以通过以下方式创建一个：

python -m venv venv
source venv/bin/activate

然后安装所需的包：

pip install modal outlines

构建图像

首先，我们需要定义我们的容器镜像。如果您需要访问受限模型，则需要提供一个 access token。请参见下面的 .env 调用，了解如何提供 HuggingFace 令牌。

设置令牌的最佳方式是通过设置环境变量 HF_TOKEN 来使用你的令牌。如果你不想这样做，我们在代码中提供了一行注释掉的代码，以便直接在代码中设置令牌。

from modal import Image, App, gpu
import os

# This creates a modal App object. Here we set the name to "outlines-app".
# There are other optional parameters like modal secrets, schedules, etc.
# See the documentation here: https://modal.com/docs/reference/modal.App
app = App(name="outlines-app")

# Specify a language model to use.
# Another good model to use is "NousResearch/Hermes-2-Pro-Mistral-7B"
language_model = "mistral-community/Mistral-7B-v0.2"

# Please set an environment variable HF_TOKEN with your Hugging Face API token.
# The code below (the .env({...}) part) will copy the token from your local
# environment to the container.
# More info on Image here: https://modal.com/docs/reference/modal.Image
outlines_image = Image.debian_slim(python_version="3.11").pip_install(
    "outlines",
    "transformers",
    "datasets",
    "accelerate",
    "sentencepiece",
).env({
    # This will pull in your HF_TOKEN environment variable if you have one.
    'HF_TOKEN':os.environ['HF_TOKEN']

    # To set the token directly in the code, uncomment the line below and replace
    # 'YOUR_TOKEN' with the HuggingFace access token.
    # 'HF_TOKEN':'YOUR_TOKEN'
})

设置容器

在运行较长的Modal应用时，建议在容器启动时下载语言模型，而不是在调用函数时。这将为未来的运行缓存模型。

# This function imports the model from Hugging Face. The modal container
# will call this function when it starts up. This is useful for
# downloading models, setting up environment variables, etc.
def import_model():
    import outlines
    outlines.models.transformers(language_model)

# This line tells the container to run the import_model function when it starts.
outlines_image = outlines_image.run_function(import_model)

定义一个模式

我们将运行在README中的JSON结构生成示例，使用以下模式：

# Specify a schema for the character description. In this case,
# we want to generate a character with a name, age, armor, weapon, and strength.
schema = """{
    "title": "Character",
    "type": "object",
    "properties": {
        "name": {
            "title": "Name",
            "maxLength": 10,
            "type": "string"
        },
        "age": {
            "title": "Age",
            "type": "integer"
        },
        "armor": {"$ref": "#/definitions/Armor"},
        "weapon": {"$ref": "#/definitions/Weapon"},
        "strength": {
            "title": "Strength",
            "type": "integer"
        }
    },
    "required": ["name", "age", "armor", "weapon", "strength"],
    "definitions": {
        "Armor": {
            "title": "Armor",
            "description": "An enumeration.",
            "enum": ["leather", "chainmail", "plate"],
            "type": "string"
        },
        "Weapon": {
            "title": "Weapon",
            "description": "An enumeration.",
            "enum": ["sword", "axe", "mace", "spear", "bow", "crossbow"],
            "type": "string"
        }
    }
}"""

为了使推理在Modal上工作，我们需要将相应的函数包装在一个 @app.function 装饰器中。我们将图像和希望该函数运行的GPU传递给这个装饰器。

让我们选择一款拥有80GB内存的A100。有效的GPU可以在这里找到。

# Define a function that uses the image we chose, and specify the GPU
# and memory we want to use.
@app.function(image=outlines_image, gpu=gpu.A100(size='80GB'))
def generate(
    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
    # Remember, this function is being executed in the container,
    # so we need to import the necessary libraries here. You should
    # do this with any other libraries you might need.
    import outlines

    # Load the model into memory. The import_model function above
    # should have already downloaded the model, so this call
    # only loads the model into GPU memory.
    model = outlines.models.transformers(
        language_model, device="cuda"
    )

    # Generate a character description based on the prompt.
    # We use the .json generation method -- we provide the
    # - model: the model we loaded above
    # - schema: the JSON schema we defined above
    generator = outlines.generate.json(model, schema)

    # Make sure you wrap your prompt in instruction tags ([INST] and [/INST])
    # to indicate that the prompt is an instruction. Instruction tags can vary
    # by models, so make sure to check the model's documentation.
    character = generator(
        f"<s>[INST]Give me a character description. Describe {prompt}.[/INST]"
    )

    # Print out the generated character.
    print(character)

然后我们需要定义一个 local_entrypoint 来远程调用我们的函数 generate。

@app.local_entrypoint()
def main(
    prompt: str = "Amiri, a 53 year old warrior woman with a sword and leather armor.",
):
    # We use the "generate" function defined above -- note too that we are calling
    # .remote() on the function. This tells modal to run the function in our cloud
    # machine. If you want to run the function locally, you can call .local() instead,
    # though this will require additional setup.
    generate.remote(prompt)

这里 @app.local_entrypoint() 装饰器定义了 main 作为在使用 Modal CLI 时本地启动的函数。您可以将上述代码保存到 example.py （或使用这个实现）。现在让我们看看如何使用 Modal CLI 在云端运行代码。

在云上运行

首先从PyPi安装Modal客户端，如果你还没有安装：

pip install modal

然后您需要从Modal获取一个令牌。运行以下命令：

modal setup

一旦设置完成，您可以使用以下方法在云端运行推断：

modal run example.py

您应该看到Modal应用程序初始化，并且很快在您的终端中看到print函数的结果。就是这样！