解析器#

解析器是LLM输出的解释器。我们有以下三种类型的解析器：

字符串解析器: 它简单地将字符串转换为所需的数据类型。它们位于 core.string_parser。
输出解析器：它负责解析和输出格式化（如yaml、json等）的过程。它们位于components.output_parsers.outputs。JsonOutputParser和YamlOutputParser可以与DataClass一起工作，以生成结构化输出。
DataClass 解析器: 在 YamlOutputParser 和 JsonOutputParser 的基础上，DataClassParser 是最兼容的，用于与 DataClass 一起工作以生成结构化输出。

上下文#

LLMs 以字符串格式输出文本。解析是根据用例提取和转换字符串为所需数据结构的过程。这个所需的数据结构可以是：

简单的数据类型，如字符串、整数、浮点数、布尔值等。
复杂数据类型如列表、字典或数据类实例。
代码如Python、SQL、html等。

它确实可以转换为用例所需的任何格式。对于LLM应用程序与外部世界交互来说，这是一个重要的步骤，例如：

将 int 转换为支持分类，将 float 转换为支持回归。
支持多选选择的列表。
转换为 json/yaml，这些将被提取为字典，并可选择进一步转换为数据类实例，以支持诸如函数调用等情况。

范围与设计#

目前，我们的目标是涵盖简单和复杂的数据类型，但不包括代码。

解析

以下列出了我们当前支持的解析范围：

int_str = "42"
float_str = "42.0"
boolean_str = "True"  # json works with true/false, yaml works for both True/False and true/false
None_str = "None"
Null_str = "null"  # json works with null, yaml works with both null and None
dict_str = '{"key": "value"}'
list_str = '["key", "value"]'
nested_dict_str = (
    '{"name": "John", "age": 30, "attributes": {"height": 180, "weight": 70}}'
)
yaml_dict_str = "key: value"
yaml_nested_dict_str = (
    "name: John\nage: 30\nattributes:\n  height: 180\n  weight: 70"
)
yaml_list_str = "- key\n- value"

在Python中，有各种解析字符串的方法：使用内置函数如int、float、bool可以处理简单类型。我们可以使用ast.literal_eval和json.loads()来处理复杂类型，如字典、列表和嵌套字典。然而，没有一个像yaml.safe_load那样健壮。Yaml可以：

将 True/False 和 ‘true/false’ 解析为布尔值。
将 None 和 ‘null’ 解析为 None。
处理嵌套的字典和列表，支持yaml和json格式。

因此，我们将使用yaml.safe_load作为最后的手段进行稳健解析，以处理复杂的数据类型，获取List和Dict数据类型。我们将使用int、float、bool来处理简单数据类型。

解析器#

我们的解析器位于 core.string_parser。它处理将提取和解析到 Python 对象类型。并且它被设计为健壮的。

解析器类	目标Python对象	描述
`BooleanParser`	`bool`	从文本中提取第一个布尔值，使用`bool`。支持‘True/False’和‘true/false’。
`IntParser`	`int`	从文本中提取第一个整数值，使用`int`。
`FloatParser`	`float`	从文本中提取第一个浮点数值，使用`float`。
`ListParser`	`list`	从文本中提取‘[]’并解析第一个列表字符串。使用json.loads和yaml.safe_load。
`JsonParser`	`dict`	从文本中提取‘[]’和‘{}’并解析JSON字符串。它使用yaml.safe_load进行稳健的解析。
`YamlParser`	`dict`	提取‘`yaml`’、‘`yml`’或整个字符串并从文本中解析YAML字符串。

数据类实例

如果你的解析对象是字典，你可以定义并使用DataClass实例。通过from_dict方法，你可以轻松地将字典转换为数据类实例。

输出解析器#

上述解析器没有附带输出格式指令。因此，我们创建了OutputParser来协调格式化和解析过程。它是一个抽象组件，具有两个主要方法：

format_instructions: 用于生成提示的输出格式说明。
call: 将输出字符串解析为所需的Python对象。

如果你的目标是dict对象，我们已经有了DataClass来帮助我们描述任何数据类类型和实例，这些类型和实例可以轻松地用于与LLMs交互。因此，JsonOutputParser和YamlOutputParser都接受以下参数：

data_class: DataClass 类型。
examples: 如果你想在提示中显示示例，这是数据类实例的示例。
exclude: 要从数据格式和示例中排除的字段，一种告诉format_instructions数据类中哪个是输出字段的方式。

数据类解析器#

为了让开发者更轻松，我们创建了DataClassParser，它理解DataClass的__input_fields__和__output_fields__，特别是在处理训练数据集时非常有用，因为我们将同时拥有输入和输出。用户不必使用exclude/include字段来指定输出字段，它会自动从DataClass实例中理解输出字段。

以下是其关键组件和功能的概述。

方法	描述	详情
`__init__(data_class: DataClass, return_data_class: bool = False, format_type: Literal["yaml", "json"] = "json")`	初始化 DataClassParser	接受一个 DataClass 类型，是否在解析后返回 DataClass 实例，以及输出格式类型（JSON 或 YAML）。
`get_input_format_str() -> str`	返回输入数据的格式化指令	提供DataClass中定义的输入字段的字符串表示。
`get_output_format_str() -> str`	返回输出数据的格式化指令	为DataClass的输出字段生成模式字符串。
`get_input_str(input: DataClass) -> str`	将输入数据格式化为字符串	根据指定的格式类型将DataClass实例转换为JSON或YAML。
`get_task_desc_str() -> str`	返回任务描述字符串	检索与DataClass关联的任务描述，对于LLM提示中的上下文非常有用。
`get_examples_str(examples: List[DataClass], include: Optional[IncludeType] = None, exclude: Optional[ExcludeType] = None) -> str`	格式化示例DataClass实例的列表	生成示例的格式化字符串表示，遵循指定的`include/exclude`参数。
`call(input: str) -> Any`	将输出字符串解析为所需格式并返回解析后的输出	处理JSON和YAML解析，如果指定，则转换为相应的DataClass。

解析器实战#

所有的解析器都非常容易使用。

布尔解析器#

from adalflow.core.string_parser import BooleanParser

bool_str = "True"
bool_str_2 = "False"
bool_str_3 = "true"
bool_str_4 = "false"
bool_str_5 = "1"  # will fail
bool_str_6 = "0"  # will fail
bool_str_7 = "yes"  # will fail
bool_str_8 = "no"  # will fail

# it will all return True/False
parser = BooleanParser()
print(parser(bool_str))
print(parser(bool_str_2))
print(parser(bool_str_3))
print(parser(bool_str_4))

打印输出将是：

True
False
True
False

布尔解析器将无法处理‘1’、‘0’、‘yes’、‘no’，因为它们不是标准的布尔值。

IntParser#

rom adalflow.core.string_parser import IntParser

int_str = "42"
int_str_2 = "42.0"
int_str_3 = "42.7"
int_str_4 = "the answer is 42.75"

# it will all return 42
parser = IntParser()
print(parser(int_str))
print(parser(int_str_2))
print(parser(int_str_3))
print(parser(int_str_4))

打印输出将是：

IntParser 将返回字符串中第一个数字的整数值，即使它是一个浮点数。

浮点数解析器#

from adalflow.core.string_parser import FloatParser

float_str = "42.0"
float_str_2 = "42"
float_str_3 = "42.7"
float_str_4 = "the answer is 42.75"

# it will all return 42.0
parser = FloatParser()
print(parser(float_str))
print(parser(float_str_2))
print(parser(float_str_3))
print(parser(float_str_4))

打印输出将是：

FloatParser 将返回字符串中第一个数字的浮点值，即使它是一个整数。

列表解析器#

from adalflow.core.string_parser import ListParser

list_str = '["key", "value"]'
list_str_2 = 'prefix["key", 2]...'
list_str_3 = '[{"key": "value"}, {"key": "value"}]'

parser = ListParser()
print(parser(list_str))
print(parser(list_str_2))
print(parser(list_str_3))

输出将是：

['key', 'value']
['key', 2]
[{'key': 'value'}, {'key': 'value'}]

Json解析器#

尽管它可以用于列表，但最好仅用于字典。

from adalflow.core.string_parser import JsonParser

dict_str = '{"key": "value"}'
nested_dict_str = (
    '{"name": "John", "age": 30, "attributes": {"height": 180, "weight": 70}}'
)
list_str = '["key", 2]'
list_dict_str = '[{"key": "value"}, {"key": "value"}]'

parser = JsonParser()
print(parser)
print(parser(dict_str))
print(parser(nested_dict_str))
print(parser(list_str))
print(parser(list_dict_str))

输出将是：

{'key': 'value'}
{'name': 'John', 'age': 30, 'attributes': {'height': 180, 'weight': 70}}
['key', 2]
[{'key': 'value'}, {'key': 'value'}]

Yaml解析器#

尽管它在几乎所有之前的示例中都有效，但最好将其用于yaml格式的字典。

from adalflow.core.string_parser import YamlParser

yaml_dict_str = "key: value"
yaml_nested_dict_str = (
    "name: John\nage: 30\nattributes:\n  height: 180\n  weight: 70"
)
yaml_list_str = "- key\n- value"

parser = YamlParser()
print(parser)
print(parser(yaml_dict_str))
print(parser(yaml_nested_dict_str))
print(parser(yaml_list_str))

输出将是：

{'key': 'value'}
{'name': 'John', 'age': 30, 'attributes': {'height': 180, 'weight': 70}}
['key', 'value']

注意

所有解析器在任何步骤失败时都会引发ValueError。开发者应相应地处理它。

输出解析器实战#

我们将创建一个简单的DataClass示例。并且我们将演示如何使用JsonOutputParser和YamlOutputParser将另一个示例解析为字典对象。

from dataclasses import dataclass, field
from adalflow.core import DataClass

@dataclass
class User(DataClass):
    id: int = field(default=1, metadata={"description": "User ID"})
    name: str = field(default="John", metadata={"description": "User name"})

user_example = User(id=1, name="John")

JsonOutputParser#

以下是使用JsonOutputParser的方法：

from adalflow.components.output_parsers import JsonOutputParser

parser = JsonOutputParser(data_class=User, examples=[user_example])
print(parser)

它的结构如下：

JsonOutputParser(
    data_class=User, examples=[json_output_parser.<locals>.User(id=1, name='John')], exclude_fields=None
    (json_output_format_prompt): Prompt(
        template: Your output should be formatted as a standard JSON instance with the following schema:
        ```
        {{schema}}
        ```
        {% if example %}
        Examples:
        ```
        {{example}}
        ```
        {% endif %}
        -Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output!
        -Use double quotes for the keys and string values.
        -Follow the JSON formatting conventions., prompt_variables: ['example', 'schema']
    )
    (output_processors): JsonParser()
)

输出格式字符串将是：

Your output should be formatted as a standard JSON instance with the following schema:
```
{
    "id": " (int) (optional)",
    "name": " (str) (optional)"
}
```
Examples:
```
{
    "id": 1,
    "name": "John"
}
________
```
-Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output!
-Use double quotes for the keys and string values.
-Follow the JSON formatting conventions.

使用以下字符串调用解析器：

user_to_parse = '{"id": 2, "name": "Jane"}'
parsed_user = parser(user_to_parse)
print(parsed_user)

输出将是：

{'id': 2, 'name': 'Jane'}

Yaml输出解析器#

步骤与JsonOutputParser完全相同。

from adalflow.components.output_parsers import YamlOutputParser

parser = YamlOutputParser(data_class=User, examples=[user_example])
print(parser)

它的结构如下：

YamlOutputParser(
data_class=<class '__main__.yaml_output_parser.<locals>.User'>, examples=[yaml_output_parser.<locals>.User(id=1, name='John')]
(yaml_output_format_prompt): Prompt(
    template: Your output should be formatted as a standard YAML instance with the following schema:
    ```
    {{schema}}
    ```
    {% if example %}
    Examples:
    ```
    {{example}}
    ```
    {% endif %}

    -Make sure to always enclose the YAML output in triple backticks (```). Please do not add anything other than valid YAML output!
    -Follow the YAML formatting conventions with an indent of 2 spaces.
    -Quote the string values properly., prompt_variables: ['schema', 'example']
)
(output_processors): YamlParser()
)

输出格式字符串将是：

Your output should be formatted as a standard YAML instance with the following schema:
```
id:  (int) (optional)
name:  (str) (optional)
```
Examples:
```
id: 1
name: John

________
```

-Make sure to always enclose the YAML output in triple backticks (```). Please do not add anything other than valid YAML output!
-Follow the YAML formatting conventions with an indent of 2 spaces.
-Quote the string values properly.

现在，让我们解析以下字符串：

user_to_parse = "id: 2\nname: Jane"
parsed_user = parser(user_to_parse)
print(parsed_user)

输出将是：

{'id': 2, 'name': 'Jane'}

DataclassParser 实战#

首先，让我们创建一个包含输入和输出字段的新数据类。

@dataclass
class SampleDataClass(DataClass):
    description: str = field(metadata={"desc": "A sample description"})
    category: str = field(metadata={"desc": "Category of the sample"})
    value: int = field(metadata={"desc": "A sample integer value"})
    status: str = field(metadata={"desc": "Status of the sample"})

    __input_fields__ = [
        "description",
        "category",
    ]  # Define which fields are input fields
    __output_fields__ = ["value", "status"]  # Define which fields are output fields

现在，让我们创建一个解析器，它将使用SampleDataClass将输出的json字符串解析回数据类实例。

from adalflow.components.output_parsers import DataClassParser

parser = DataClassParser(data_class=SampleDataClass, return_data_class=True, format_type="json")

让我们查看解析器的结构，使用print(parser)。

输出将是：

DataClassParser(
    data_class=SampleDataClass, format_type=json,            return_data_class=True, input_fields=['description', 'category'],            output_fields=['value', 'status']
    (_output_processor): JsonParser()
    (output_format_prompt): Prompt(
        template: Your output should be formatted as a standard JSON instance with the following schema:
        ```
        {{schema}}
        ```
        -Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output!
        -Use double quotes for the keys and string values.
        -DO NOT mistaken the "properties" and "type" in the schema as the actual fields in the JSON output.
        -Follow the JSON formatting conventions., prompt_variables: ['schema']
    )
)

您可以使用以下方法获取输出和输入格式字符串：

print(parser.get_input_format_str())
print(parser.get_output_format_str())

输出格式字符串的输出将是：

Your output should be formatted as a standard JSON instance with the following schema:
```
{
    "value": " (int) (required)",
    "status": " (str) (required)"
}
```
-Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output!
-Use double quotes for the keys and string values.
-DO NOT mistaken the "properties" and "type" in the schema as the actual fields in the JSON output.
-Follow the JSON formatting conventions.

输入格式字符串将是：

{
    "description": " (str) (required)",
    "category": " (str) (required)"
}

将json字符串转换为数据类实例：

user_input = '{"description": "Parsed description", "category": "Sample Category", "value": 100, "status": "active"}'
parsed_instance = parser.call(user_input)

print(parsed_instance)

输出将是：

SampleDataClass(description='Parsed description', category='Sample Category', value=100, status='active')

尝试以下示例字符串：

samples = [
    SampleDataClass(
        description="Sample description",
        category="Sample category",
        value=100,
        status="active",
    ),
    SampleDataClass(
        description="Another description",
        category="Another category",
        value=200,
        status="inactive",
    ),
]

examples_str = parser.get_examples_str(examples=samples)
print(examples_str)

输出将是：

examples_str:
{
    "description": "Sample description",
    "category": "Sample category",
    "value": 100,
    "status": "active"
}
__________
{
    "description": "Another description",
    "category": "Another category",
    "value": 200,
    "status": "inactive"
}
__________

API 参考