用法#

运行提示的命令是llm prompt 'your prompt'。这是默认命令，因此您可以使用llm 'your prompt'作为快捷方式。

执行提示#

这些示例使用默认的OpenAI gpt-4o-mini模型，这需要您首先设置一个OpenAI API密钥。

您可以安装LLM插件来使用其他提供商的模型，包括您可以直接在自己的计算机上运行的开放许可模型。

要运行提示，流式传输令牌时：

llm 'Ten names for cheesecakes'

要禁用流式传输并仅在完成后返回响应：

llm 'Ten names for cheesecakes' --no-stream

要从ChatGPT 4o-mini（默认）切换到GPT-4o：

llm 'Ten names for cheesecakes' -m gpt-4o

你可以使用 -m 4o 作为更短的快捷方式。

传递 --model name> 以使用不同的模型。运行 llm models 查看可用模型的列表。

你也可以发送一个提示到标准输入，例如：

echo 'Ten names for cheesecakes' | llm

如果您将文本发送到标准输入并提供参数，生成的提示将包括管道内容后跟参数：

cat myscript.py | llm 'explain this code'

将运行一个提示：

<contents of myscript.py> explain this code

对于支持它们的模型，系统提示是这种提示的更好工具。

一些模型支持选项。您可以使用-o/--option name value来传递这些选项 - 例如，要将温度设置为1.5，请运行以下命令：

llm 'Ten names for cheesecakes' -o temperature 1.5

提取围栏代码块#

如果您正在使用LLM生成代码，检索它生成的代码而不包含任何周围的解释文本可能会很有用。

-x/--extract 选项将扫描响应以查找第一个 Markdown 围栏代码块的实例 - 类似于这样的内容：

```python
def my_function():
    # ...
```

它将提取并仅返回该块的内容，不包括围栏代码分隔符。如果没有围栏代码块，它将返回完整的响应。

使用 --xl/--extract-last 返回最后一个围栏代码块而不是第一个。

包括解释性文本在内的整个响应仍然记录到数据库中，可以使用llm logs -c查看。

附件#

一些模型是多模态的，这意味着它们可以接受不仅仅是文本的输入。GPT-4o 和 GPT-4o mini 可以接受图像，而像 Google Gemini 1.5 这样的模型还可以接受音频和视频。

LLM 将这些称为附件。你可以使用 -a 选项来传递附件，如下所示：

llm "describe this image" -a https://static.simonwillison.net/static/2024/pelicans.jpg

附件可以通过URL或文件路径传递，并且您可以将多个附件附加到单个提示中：

llm "extract text" -a image1.jpg -a image2.jpg

你也可以通过使用-作为文件名将附件传递给LLM：

cat image.jpg | llm "describe this image" -a -

LLM 将尝试自动检测图像的内容类型。如果这不起作用，你可以使用 --attachment-type 选项（简写为 --at），它接受 URL/路径加上明确的内容类型：

cat myfile | llm "describe this image" --at - image/jpeg

系统提示#

你可以使用 -s/--system '...' 来设置系统提示。

llm 'SQL to calculate total sales by month' \
  --system 'You are an exaggerated sentient cheesecake that knows SQL and talks about cheesecake a lot'

这对于将内容传输到标准输入非常有用，例如：

curl -s 'https://simonwillison.net/2023/May/15/per-interpreter-gils/' | \
  llm -s 'Suggest topics for this post as a JSON array'

或者生成自上次提交以来对Git仓库所做更改的描述：

git diff | llm -s 'Describe these changes'

不同的模型以不同的方式支持系统提示。

OpenAI 模型特别擅长使用系统提示作为指令，以指导它们如何处理作为常规提示的一部分发送的额外输入。

其他模型可能会使用系统提示来改变模型的默认声音和态度。

系统提示可以保存为模板以创建可重复使用的工具。例如，您可以创建一个名为pytest的模板，如下所示：

llm -s 'write pytest tests for this code' --save pytest

然后像这样使用新模板：

cat llm/utils.py | llm -t pytest

查看更多，请参见prompt templates。

继续对话#

默认情况下，每次运行该工具时都会开始一个新的对话。

您可以选择通过传递-c/--continue选项来继续之前的对话：

llm 'More names' -c

这将重新发送之前对话的提示和响应，作为调用语言模型的一部分。请注意，这可能会迅速增加令牌数量，尤其是在使用昂贵的模型时。

--continue 将自动使用与您正在继续的对话相同的模型，即使您省略了 -m/--model 选项。

要继续一个不是最近的对话，请使用 --cid/--conversation 选项：

llm 'More names' --cid 01h53zma5txeby33t1kbe3xk8q

您可以使用llm logs命令找到这些对话ID。

使用LLM与Bash或Zsh的提示#

要了解更多关于您计算机操作系统的信息，基于uname -a的输出，请运行以下命令：

llm "Tell me about my operating system: $(uname -a)"

这种在双引号字符串中使用$(command)的模式是一种快速组装提示的有用方法。

完成提示#

一些模型是完成模型——它们不是被调整为响应聊天风格的提示，而是被设计来完成一个句子或段落。

一个例子是gpt-3.5-turbo-instruct OpenAI模型。

您可以像提示聊天模型一样提示该模型，但请注意，最有效的提示格式可能会有所不同。

llm -m gpt-3.5-turbo-instruct 'Reasons to tame a wild beaver:'

开始一个互动聊天#

llm chat 命令启动与模型的持续交互式聊天。

这对于在您自己的机器上运行的模型特别有用，因为它避免了每次向对话中添加新提示时都必须将它们加载到内存中。

运行 llm chat，可以选择使用 -m model_id，以开始一个聊天对话：

llm chat -m chatgpt

每次聊天都会开始一个新的对话。每次对话的记录可以通过日志访问。

你可以传递 -c 来启动一个对话，作为你最近提示的延续。这将自动使用最近使用的模型：

llm chat -c

对于支持它们的模型，您可以使用-o/--option传递选项：

llm chat -m gpt-4 -o temperature 0.5

你可以传递一个系统提示用于你的聊天对话：

llm chat -m gpt-4 -s 'You are a sentient cheesecake'

你也可以传递模板 - 这对于创建你希望返回的聊天角色非常有用。

以下是如何为您的GPT-4驱动的芝士蛋糕创建模板：

llm --system 'You are a sentient cheesecake' -m gpt-4 --save cheesecake

现在你可以随时使用这个与你的芝士蛋糕开始新的聊天：

llm chat -t cheesecake

Chatting with gpt-4
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> who are you?
I am a sentient cheesecake, meaning I am an artificial
intelligence embodied in a dessert form, specifically a
cheesecake. However, I don't consume or prepare foods
like humans do, I communicate, learn and help answer
your queries.

输入 quit 或 exit 后按来结束聊天会话。

有时你可能想一次性将多行文本粘贴到聊天中 - 例如在调试错误消息时。

要执行此操作，请输入 !multi 以开始多行输入。输入或粘贴您的文本，然后输入 !end 并按下以完成。

如果您粘贴的文本本身可能包含!end行，您可以使用!multi abc设置自定义分隔符，然后在末尾加上!end abc：

Chatting with gpt-4
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> !multi custom-end
 Explain this error:

   File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

 !end custom-end

列出可用模型#

llm models 命令列出了可以与LLM一起使用的每个模型及其别名。这包括使用plugins安装的模型。

llm models

示例输出：

OpenAI Chat: gpt-4o (aliases: 4o)
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
OpenAI Chat: o1-preview
OpenAI Chat: o1-mini
GeminiPro: gemini-1.5-pro-002
GeminiPro: gemini-1.5-flash-002
...

添加一个或多个-q term选项以搜索匹配所有这些搜索项的模型：

llm models -q gpt-4o
llm models -q 4o -q mini

添加 --options 以查看每个模型支持的选项的文档：

llm models --options

输出:

OpenAI Chat: gpt-4o (aliases: 4o)
  Options:
    temperature: float
      What sampling temperature to use, between 0 and 2. Higher values like
      0.8 will make the output more random, while lower values like 0.2 will
      make it more focused and deterministic.
    max_tokens: int
      Maximum number of tokens to generate.
    top_p: float
      An alternative to sampling with temperature, called nucleus sampling,
      where the model considers the results of the tokens with top_p
      probability mass. So 0.1 means only the tokens comprising the top 10%
      probability mass are considered. Recommended to use top_p or
      temperature but not both.
    frequency_penalty: float
      Number between -2.0 and 2.0. Positive values penalize new tokens based
      on their existing frequency in the text so far, decreasing the model's
      likelihood to repeat the same line verbatim.
    presence_penalty: float
      Number between -2.0 and 2.0. Positive values penalize new tokens based
      on whether they appear in the text so far, increasing the model's
      likelihood to talk about new topics.
    stop: str
      A string where the API will stop generating further tokens.
    logit_bias: dict, str
      Modify the likelihood of specified tokens appearing in the completion.
      Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
    seed: int
      Integer seed to attempt to sample deterministically
    json_object: boolean
      Output a valid JSON object {...}. Prompt must mention JSON.
  Attachment types:
    image/gif, image/jpeg, image/png, image/webp
OpenAI Chat: chatgpt-4o-latest (aliases: chatgpt-4o)
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
  Attachment types:
    image/gif, image/jpeg, image/png, image/webp
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
  Attachment types:
    image/gif, image/jpeg, image/png, image/webp
OpenAI Chat: gpt-4o-audio-preview
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
  Attachment types:
    audio/mpeg, audio/wav
OpenAI Chat: gpt-4o-audio-preview-2024-12-17
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
  Attachment types:
    audio/mpeg, audio/wav
OpenAI Chat: gpt-4o-audio-preview-2024-10-01
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
  Attachment types:
    audio/mpeg, audio/wav
OpenAI Chat: gpt-4o-mini-audio-preview
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
  Attachment types:
    audio/mpeg, audio/wav
OpenAI Chat: gpt-4o-mini-audio-preview-2024-12-17
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
  Attachment types:
    audio/mpeg, audio/wav
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: gpt-4-1106-preview
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: gpt-4-0125-preview
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: gpt-4-turbo-2024-04-09
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: o1
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
    reasoning_effort: str
  Attachment types:
    image/gif, image/jpeg, image/png, image/webp
OpenAI Chat: o1-2024-12-17
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
    reasoning_effort: str
  Attachment types:
    image/gif, image/jpeg, image/png, image/webp
OpenAI Chat: o1-preview
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: o1-mini
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
OpenAI Chat: o3-mini
  Options:
    temperature: float
    max_tokens: int
    top_p: float
    frequency_penalty: float
    presence_penalty: float
    stop: str
    logit_bias: dict, str
    seed: int
    json_object: boolean
    reasoning_effort: str
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
  Options:
    temperature: float
      What sampling temperature to use, between 0 and 2. Higher values like
      0.8 will make the output more random, while lower values like 0.2 will
      make it more focused and deterministic.
    max_tokens: int
      Maximum number of tokens to generate.
    top_p: float
      An alternative to sampling with temperature, called nucleus sampling,
      where the model considers the results of the tokens with top_p
      probability mass. So 0.1 means only the tokens comprising the top 10%
      probability mass are considered. Recommended to use top_p or
      temperature but not both.
    frequency_penalty: float
      Number between -2.0 and 2.0. Positive values penalize new tokens based
      on their existing frequency in the text so far, decreasing the model's
      likelihood to repeat the same line verbatim.
    presence_penalty: float
      Number between -2.0 and 2.0. Positive values penalize new tokens based
      on whether they appear in the text so far, increasing the model's
      likelihood to talk about new topics.
    stop: str
      A string where the API will stop generating further tokens.
    logit_bias: dict, str
      Modify the likelihood of specified tokens appearing in the completion.
      Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
    seed: int
      Integer seed to attempt to sample deterministically
    logprobs: int
      Include the log probabilities of most likely N per token
Default: gpt-4o-mini

运行提示时，您可以将完整的模型名称或任何别名传递给 -m/--model 选项：

llm -m 4o \
  'As many names for cheesecakes as you can think of, with detailed descriptions'