快速入门

快速开始使用LlamaParse解析文档——无论您偏好Python、TypeScript还是网页界面。本指南将引导您创建API密钥并运行首个任务。

获取您的API密钥

🔑 开始之前：您需要一个API密钥来访问LlamaParse服务。

获取您的API密钥 →

选择您的设置

在网页界面中使用 LlamaParse

如果您是非技术人员或只是想快速试用LlamaParse，网页界面是最简单的入门方式。

逐步工作流程

Go to LlamaCloud
从推荐设置中选择一个解析预设，或切换到高级设置进行自定义配置
上传您的文档
点击解析并在浏览器中直接查看解析结果

选择预设

LlamaParse 提供四种主要预设用于推荐设置：

成本效益高 – 针对速度和成本进行优化。最适合结构简单、文本密集的文档。
智能体模式 – 默认选项。适用于包含图像和图表的文档，但在处理复杂布局时可能存在困难。
Agentic Plus – 最高保真度。最适合复杂布局、表格和视觉结构。
面向用例 – 列出针对特定类型文档（如发票、表格、技术简历和科学论文）定制的一系列解析选项。

了解更多关于解析预设的信息

自定义模式的高级设置

高级设置选项让您完全掌控文档解析方式。您可以从多种模式中选择，包括多模态和模型特定选项。

这最适合高级使用场景。了解更多关于解析模式的信息

安装软件包

pip install llama-cloud-services

从命令行解析

您可以使用命令行界面解析您的第一个PDF文件。使用命令 llama-parse [file_paths]。通过 llama-parse --help 查看帮助文本。

export LLAMA_CLOUD_API_KEY='llx-...'

# output as text
llama-parse my_file.pdf --result-type text --output-file output.txt

# output as markdown
llama-parse my_file.pdf --result-type markdown --output-file output.md

# output as raw json
llama-parse my_file.pdf --output-raw-json --output-file output.json

在Python中解析

您也可以创建简单的脚本：

from llama_cloud_services import LlamaParse

parser = LlamaParse(
    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
    num_workers=4,       # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",       # optionally define a language, default=en
)

# sync
result = parser.parse("./my_file.pdf")

# sync batch
results = parser.parse(["./my_file1.pdf", "./my_file2.pdf"])

# async
result = await parser.aparse("./my_file.pdf")

# async batch
results = await parser.aparse(["./my_file1.pdf", "./my_file2.pdf"])

结果对象是一个完全类型化的 JobResult 对象。您可以通过与之交互来解析和转换结果的各个部分：

# get the llama-index markdown documents
markdown_documents = result.get_markdown_documents(split_by_page=True)

# get the llama-index text documents
text_documents = result.get_text_documents(split_by_page=False)

# get the image documents
image_documents = result.get_image_documents(
    include_screenshot_images=True,
    include_object_images=False,
    # Optional: download the images to a directory
    # (default is to return the image bytes in ImageDocument objects)
    image_download_dir="./images",
)

# access the raw job result
# Items will vary based on the parser configuration
for page in result.pages:
    print(page.text)
    print(page.md)
    print(page.images)
    print(page.layout)
    print(page.structuredData)

就这样！查看下面的示例或前往Python客户端文档。

示例

可以在客户端的示例文件夹中找到几个端到端的索引示例：

安装软件包

npm init
npm install -D typescript @types/node

LlamaParse 支持已内置到 LlamaIndex for TypeScript 中，因此您需要安装 LlamaIndex.TS：

npm install llama-cloud-services dotenv

让我们创建一个 parse.ts 文件并将我们的依赖项放入其中：

import {
  LlamaParseReader,
  // we'll add more here later
} from "llama-cloud-services";
import 'dotenv/config'

现在让我们创建主函数，它将加载关于加拿大的趣味事实并解析它们：

async function main() {
  // save the file linked above as sf_budget.pdf, or change this to match
  const path = "./canada.pdf";

  // set up the llamaparse reader
  const reader = new LlamaParseReader({ resultType: "markdown" });

  // parse the document
  const documents = await reader.loadData(path);

  // print the parsed document
  console.log(documents)
}

main().catch(console.error);

现在运行该文件：

npx tsx parse.ts

恭喜！您已成功解析文件，应该能看到类似以下的输出：

[
  Document {
    id_: '02f5e252-9dca-47fa-80b2-abdd902b911a',
    embedding: undefined,
    metadata: { file_path: './canada.pdf' },
    excludedEmbedMetadataKeys: [],
    excludedLlmMetadataKeys: [],
    relationships: {},
    text: '# Fun Facts About Canada\n' +
      '\n' +
      'We may be known as the Great White North, but
  ...etc...

您现在可以在自己的TypeScript项目中使用此功能。前往TypeScript文档了解更多关于LlamaIndex在TypeScript中的应用。

使用 REST API

如果您更倾向于使用原始API，REST API允许您将解析功能集成到任何环境中——无需客户端。以下是帮助您入门的示例端点。

1. 上传文件并开始解析

将文档发送至 API 以开始解析任务：

curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/v1/parsing/upload' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

2. 检查解析任务的状态

使用上传步骤返回的 job_id 来监控解析进度：

curl -X 'GET' \
  'https://api.cloud.llamaindex.ai/api/v1/parsing/job/<job_id>' \
  -H 'accept: application/json' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"

3. 以Markdown格式检索结果

任务完成后，您可以获取结构化结果：

curl -X 'GET' \
  'https://api.cloud.llamaindex.ai/api/v1/parsing/job/<job_id>/result/markdown' \
  -H 'accept: application/json' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"

更多详情请参阅我们的API参考文档

示例

以下是一个关于原始API使用的示例笔记本

资源

See 信用定价与使用
下一步？查看 LlamaExtract 从非结构化文档中提取结构化数据！