分块节点

概述

Chunk节点用于根据令牌计数将字符串拆分为字符串数组。

将字符串分块处理有助于避免触及LLMs的token数量限制。您可以将一个字符串分割成多个块，然后将每个块输入到单独的Chat节点中，最后将这些聊天节点的输出结果重新组合起来，从而有效回答关于超出LLM处理能力的文本字符串的问题。

Chunk节点还可以通过使用first或last输出来将字符串截断到特定的标记数量，从开头或结尾开始。

如果指定了重叠百分比，那么分块之间将按照指定的百分比（相对于最大token数）进行重叠。例如，如果最大token数为100且重叠率为50%，那么分块之间将重叠50个token。这有助于避免分块间丢失上下文信息，但可能会导致总的分块数量增加。

Chunk Node Screenshot

标题	数据类型	描述	默认值	备注
Input	`string`	The string that should be chunked.	(Required)	None

标题	数据类型	描述	备注
Chunks	`string[]`	The array of string chunks after splitting the string by the configured amount of tokens.	May be an array of length 1 if the string did not need to be split.
First	`string`	The first chunk in the chunks array.	Useful for truncating a string to a specified token count.
Last	`string`	The last chunk in the chunks array.	Useful for truncating a string from the start to a specified token count.
Indexes	`number[]`	A list of the indexes of the chunks.	Useful when filtering or zipping the chunks array, and other more complex tasks.
Count	`number`	The number of chunks in the chunks array.	Has many uses for more complex tasks.

设置	描述	默认值	使用输入切换	输入数据类型
Model	The model to use for tokenizing. Different LLMs use different tokenizers.	gpt-3.5-turbo	Yes	`string`
Max Tokens	The maximum number of tokens in the chunk.	1024	Yes	`number`
Overlap	The amount of overlap (0-100% as 0-1) between chunks, as a factor of the max token count.	0	Yes	`number`

chunk节点没有显著的错误处理行为。如果输入不是字符串，那么它将被强制转换为字符串。