订阅我们的新闻通讯以获取更新和提示¶

如果您想获取新功能的更新以及如何使用Instructor的提示，您可以在下面订阅我们的新闻通讯，以便在我们发布新内容时收到通知。

高级主题¶

AI 开发与优化¶

语言模型和提示技术¶

集成和工具¶

媒体和资源¶

2024/12/26
在 UV
5分钟阅读

迁移到uv

为什么我们迁移到uv

我们最近从poetry迁移到了uv，因为我们希望从其众多功能中受益，例如

更容易的依赖管理，内置自动缓存
与 poetry 相比，CI/CD 明显更快，特别是我们使用 Astral 团队提供的 caching 功能时
货物风格的锁定文件，使得在新PEP特性发布时更容易采纳它们

我们花了大约1-2天的时间来处理迁移，并对结果感到满意。平均而言，对于CI/CD，我们的工作速度有了显著的提升。

以下是我从我们的CI/CD运行中获取的一些作业时间。

总的来说，我认为一旦我们为单个uv GitHub操作实现了缓存，我们看到了大约3倍的加速，作业所需的时间减少了约67%。

2024/12/11
在 OpenAI，多模态
5分钟阅读

使用结构化提取从图像中提取元数据

像gpt-4o这样的多模态语言模型擅长处理多模态数据，使我们能够从图像中提取丰富的结构化元数据。

这在时尚等领域尤其有价值，我们可以利用这些功能从图像甚至视频中理解用户的风格偏好。在这篇文章中，我们将看到如何使用instructor将图像映射到给定的产品分类，以便我们可以为用户推荐类似的产品。

2024/12/10
在 OpenAI
5分钟阅读

使用GPT-4o生成一致的故事

语言模型在生成具有大量节点的一致图时遇到困难。通常，这是因为图本身太大，模型无法处理。这导致模型生成不一致的图，其中包含无效和断开的节点等问题。

在本文中，我们将通过一个简单的生成“选择你自己的冒险”故事的例子，探讨如何使用两阶段方法来绕过这一限制，从而生成复杂的DAGs（有向无环图）与gpt-4o。

2024年12月10日
在 OpenAI
5分钟阅读

与GPT-4o一致的故事

语言模型在生成具有大量节点的一致图时存在困难。通常，这是因为图本身太大，模型无法处理。这导致模型生成不一致的图，其中包含无效和断开的节点等问题。

在本文中，我们将通过一个简单的生成“选择你自己的冒险”故事的例子，探讨如何通过使用两阶段方法来绕过这一限制，从而生成复杂的DAGs（有向无环图）与gpt-4o。

2024/11/21
在数据分析，结构化输出
4分钟阅读

使用结构化输出将混乱的表格转换为整洁的数据

为什么这是一个问题？

混乱的数据导出是一个常见问题。无论是表格中的多个标题、使分析变得困难的隐式关系，还是仅仅是合并的单元格，使用instructor与结构化输出可以轻松将混乱的表格转换为整洁的数据，即使你只有表格的图像，我们将在下面看到。

让我们以以下表格为例。它通过空单元格和隐式重复隐藏了数据关系，使得分析变得不必要地困难。如果我们将其用于数据分析，手动清理将是一个巨大的噩梦。

2024/11/19
在 Writer SDK
3分钟阅读

现在支持使用Writer的结构化输出

我们很高兴地宣布，instructor 现在支持 Writer 的企业级LLM，包括他们最新的 Palmyra X 004 模型。这一集成使得能够利用 Writer 强大的语言模型实现结构化输出和企业AI工作流程。

入门指南

首先，确保您已在Writer上注册了一个账户，并使用此快速入门指南获取了一个API密钥。完成后，通过在终端中运行pip install instructor[writer]来安装支持Writer的instructor。

确保使用您的Writer API密钥设置WRITER_API_KEY环境变量，或将其作为参数传递给Writer构造函数。

2024/11/15
在 Gemini，文档处理
4分钟阅读

使用Gemini通过结构化输出消除幻觉

在这篇文章中，我们将探讨如何将Google的Gemini模型与Instructor结合使用，以从PDF中生成准确的引用。这种方法确保答案基于PDF的实际内容，减少了幻觉的风险。

我们将使用Nvidia的10k报告作为示例，您可以在此链接下载。

2024/11/11
在 Gemini，文档处理
3分钟阅读

使用Gemini进行PDF处理并生成结构化输出

在这篇文章中，我们将探讨如何使用Google的Gemini模型与Instructor来分析Gemini 1.5 Pro Paper并提取结构化摘要。

问题

以编程方式处理PDF一直很痛苦。典型的方法都有显著的缺点：

PDF 解析库 需要复杂的规则并且容易出错
OCR 解决方案 处理速度慢且容易出错
专业的 PDF API 价格昂贵且需要额外的集成
LLM 解决方案 通常需要复杂的文档切块和嵌入管道

如果我们能把一个PDF交给LLM并得到结构化数据会怎样？借助Gemini的多模态能力和Instructor的结构化输出处理，我们完全可以做到这一点。

快速设置

首先，安装所需的包：

pip install "instructor[google-generativeai]"

然后，这是你需要的所有代码：

import instructor
import google.generativeai as genai
from google.ai.generativelanguage_v1beta.types.file import File
from pydantic import BaseModel
import time

# Initialize the client
client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    )
)


# Define your output structure
class Summary(BaseModel):
    summary: str


# Upload the PDF
file = genai.upload_file("path/to/your.pdf")

# Wait for file to finish processing
while file.state != File.State.ACTIVE:
    time.sleep(1)
    file = genai.get_file(file.name)
    print(f"File is still uploading, state: {file.state}")

print(f"File is now active, state: {file.state}")
print(file)

resp = client.chat.completions.create(
    messages=[
        {"role": "user", "content": ["Summarize the following file", file]},
    ],
    response_model=Summary,
)

print(resp.summary)

Expand to see Raw Results

summary="Gemini 1.5 Pro is a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. It achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Gemini 1.5 Pro is built to handle extremely long contexts; it has the ability to recall and reason over fine-grained information from up to at least 10M tokens. This scale is unprecedented among contemporary large language models (LLMs), and enables the processing of long-form mixed-modality inputs including entire collections of documents, multiple hours of video, and almost five days long of audio. Gemini 1.5 Pro surpasses Gemini 1.0 Pro and performs at a similar level to 1.0 Ultra on a wide array of benchmarks while requiring significantly less compute to train. It can recall information amidst distractor context, and it can learn to translate a new language from a single set of linguistic documentation. With only instructional materials (a 500-page reference grammar, a dictionary, and ≈ 400 extra parallel sentences) all provided in context, Gemini 1.5 Pro is capable of learning to translate from English to Kalamang, a Papuan language with fewer than 200 speakers, and therefore almost no online presence."