生成
一旦构建了Outlines model,您可以使用outlines.generate来生成文本。可以通过outlines.generate.text进行标准LLM生成,以及以下描述的各种结构化生成方法。(有关结构化生成工作原理的详细技术说明,您可以查看Structured Generation Explanation页面)
在生成文本之前,您必须构建一个 outlines.model。示例:
import outlines
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct", device="cuda")
文本生成器
generator = outlines.generate.text(model)
result = generator("Question: What's 2+2? Answer:", max_tokens=100)
print(result)
# The answer is 4
# Outlines also supports streaming output
stream = generator.stream("What's 2+2?", max_tokens=4)
for i in range(5):
token = next(stream)
print(repr(token))
# '2'
# '+'
# '2'
# ' equals'
# '4'
多标签分类
Outlines 允许您通过引导模型进行多标签分类,以便它只能输出指定的选项之一:
import outlines
model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = outlines.generate.choice(model, ["Blue", "Red", "Yellow"])
color = generator("What is the closest color to Indigo? ")
print(color)
# Blue
JSON结构生成
Outlines可以指导模型,使它们始终输出有效的JSON 100%。您可以使用Pydantic指定结构,或者使用包含JSON Schema的字符串:
from enum import Enum
from pydantic import BaseModel, constr, conint
import outlines
class Armor(str, Enum):
leather = "leather"
chainmail = "chainmail"
plate = "plate"
class Character(BaseModel):
name: constr(max_length=10)
age: conint(gt=18, lt=99)
armor: Armor
strength: conint(gt=1, lt=100)
model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = outlines.generate.json(model, Character)
character = generator(
"Generate a new character for my awesome game: "
+ "name, age (between 1 and 99), armor and strength. "
)
print(character)
# name='Orla' age=21 armor=<Armor.plate: 'plate'> strength=8
import outlines
schema = """{
"$defs": {
"Armor": {
"enum": ["leather", "chainmail", "plate"],
"title": "Armor",
"type": "string"
}
},
"properties": {
"name": {"maxLength": 10, "title": "Name", "type": "string"},
"age": {"title": "Age", "type": "integer"},
"armor": {"$ref": "#/$defs/Armor"},
"strength": {"title": "Strength", "type": "integer"}\
},
"required": ["name", "age", "armor", "strength"],
"title": "Character",
"type": "object"
}"""
model = outlines.models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = outlines.generate.json(model, schema)
character = generator(
"Generate a new character for my awesome game: "
+ "name, age (between 1 and 99), armor and strength. "
)
print(character)
# {'name': 'Yuki', 'age': 24, 'armor': 'plate', 'strength': 3}
注意
我们建议您在首次测试模式时限制字符串字段的长度,尤其是在使用小型模型时。
语法结构生成
Outlines 还允许生成在任何 上下文无关文法 (CFG) 中有效的文本,采用 EBNF 格式。文法可能令人畏惧,但它们是一种非常强大的工具!实际上,它们决定了每种编程语言的语法、有效的国际象棋走法、分子结构,还可以帮助生成过程图形等。
在这里,我们展示了一个定义算术运算的语法的简单示例:
from outlines import models, generate
arithmetic_grammar = """
?start: sum
?sum: product
| sum "+" product -> add
| sum "-" product -> sub
?product: atom
| product "*" atom -> mul
| product "/" atom -> div
?atom: NUMBER -> number
| "-" atom -> neg
| "(" sum ")"
%import common.NUMBER
%import common.WS_INLINE
%ignore WS_INLINE
"""
model = models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = generate.cfg(model, arithmetic_grammar, max_tokens=100)
result = generator("Question: How can you write 5*5 using addition?\nAnswer:")
print(result)
# 5+5+5+5+5
EBNF 语法编写起来可能很繁琐。这就是为什么 Outlines 在 outlines.grammars. 模块中提供语法定义
from outlines import models, generate, grammars
model = models.transformers("microsoft/Phi-3-mini-128k-instruct")
generator = generate.cfg(model, grammars.arithmetic, max_tokens=100)
result = generator("Question: How can you write 5*5 using addition?\nAnswer:")
print(result)
# 5+5+5+5+5
可用的语法列在 这里。
正则结构化生成
稍微简单一些,但同样有用,Outlines可以生成以正则表达式语言编写的文本。例如,强制模型生成IP地址:
from outlines import models, generate
model = models.transformers("microsoft/Phi-3-mini-128k-instruct")
regex_str = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
generator = generate.regex(model, regex_str)
result = generator("What is the IP address of localhost?\nIP: ")
print(result)
# 127.0.0.100
生成给定的Python类型
我们为简单用例提供了一个正则结构生成的快捷方式。将一个Python类型传递给outlines.generate.format函数,LLM将输出符合该类型的文本: