PII 掩码 - Presidio
快速开始
LiteLLM 支持 Microsoft Presidio 进行 PII 掩码处理。
1. 在 LiteLLM 配置文件中定义防护栏
在 guardrails 部分下定义你的防护栏
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
guardrails:
- guardrail_name: "presidio-pre-guard"
litellm_params:
guardrail: presidio # 支持的值: "aporia", "bedrock", "lakera", "presidio"
mode: "pre_call"
设置以下环境变量
export PRESIDIO_ANALYZER_API_BASE="http://localhost:5002"
export PRESIDIO_ANONYMIZER_API_BASE="http://localhost:5001"
mode 支持的值
pre_call在 LLM 调用之前运行,针对输入post_call在 LLM 调用之后运行,针对输入和输出logging_only在 LLM 调用之后运行,仅在记录到 Langfuse 等之前应用 PII 掩码。不在实际的 llm api 请求/响应上应用。
2. 启动 LiteLLM 网关
litellm --config config.yaml --detailed_debug
3. 测试请求
期望这将掩码 Jane Doe,因为它属于 PII
curl http://localhost:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hello my name is Jane Doe"}
],
"guardrails": ["presidio-pre-guard"],
}'
预期失败响应
{
"id": "chatcmpl-A3qSC39K7imjGbZ8xCDacGJZBoTJQ",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello, <PERSON>! How can I assist you today?",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"created": 1725479980,
"model": "gpt-3.5-turbo-2024-07-18",
"object": "chat.completion",
"system_fingerprint": "fp_5bd87c427a",
"usage": {
"completion_tokens": 13,
"prompt_tokens": 14,
"total_tokens": 27
},
"service_tier": null
}
curl http://localhost:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hello good morning"}
],
"guardrails": ["presidio-pre-guard"],
}'
进阶
按请求设置 language
Presidio API 支持传递 language 参数。以下是如何按请求设置 language
curl http://localhost:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "is this credit card number 9283833 correct?"}
],
"guardrails": ["presidio-pre-guard"],
"guardrail_config": {"language": "es"}
}'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# 发送到 litellm 代理上设置的模型的请求,`litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
extra_body={
"metadata": {
"guardrails": ["presidio-pre-guard"],
"guardrail_config": {"language": "es"}
}
}
)
print(response)
输出解析
LLM 响应有时可能包含掩码的令牌。
对于 presidio 的 'replace' 操作,LiteLLM 可以检查 LLM 响应并将掩码的令牌替换为用户提交的值。
在 guardrails 部分下定义你的防护栏
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
guardrails:
- guardrail_name: "presidio-pre-guard"
litellm_params:
guardrail: presidio # 支持的值: "aporia", "bedrock", "lakera", "presidio"
mode: "pre_call"
output_parse_pii: True
预期流程:
用户输入: "hello world, my name is Jane Doe. My number is: 034453334"
LLM 输入: "hello world, my name is [PERSON]. My number is: [PHONE_NUMBER]"
LLM 响应: "Hey [PERSON], nice to meet you!"
用户响应: "Hey Jane Doe, nice to meet you!"
临时识别器
将临时识别器发送到 presidio /analyze,通过将 json 文件传递给代理
在 LiteLLM config.yaml 中定义临时识别器
在 guardrails 部分定义您的防护措施
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
guardrails:
- guardrail_name: "presidio-pre-guard"
litellm_params:
guardrail: presidio # 支持的值: "aporia", "bedrock", "lakera", "presidio"
mode: "pre_call"
presidio_ad_hoc_recognizers: "./hooks/example_presidio_ad_hoc_recognizer.json"
设置以下环境变量
export PRESIDIO_ANALYZER_API_BASE="http://localhost:5002"
export PRESIDIO_ANONYMIZER_API_BASE="http://localhost:5001"
当您运行代理时,您可以看到它的工作情况:
litellm --config /path/to/config.yaml --debug
进行聊天完成请求,示例:
{
"model": "azure-gpt-3.5",
"messages": [{"role": "user", "content": "John Smith AHV number is 756.3026.0705.92. Zip code: 1334023"}]
}
并搜索任何以 Presidio PII Masking 开头的日志,示例:
Presidio PII Masking: Redacted pii message: <PERSON> AHV number is <AHV_NUMBER>. Zip code: <US_DRIVER_LICENSE>
仅记录
仅在记录到 Langfuse 等之前应用 PII 屏蔽。
不在实际的 llm api 请求/响应上。
note
目前仅适用于
/chat/completion请求- 在 'success' 日志记录上
- 在 LiteLLM config.yaml 中定义模式:
logging_only
在 guardrails 部分定义您的防护措施
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
guardrails:
- guardrail_name: "presidio-pre-guard"
litellm_params:
guardrail: presidio # 支持的值: "aporia", "bedrock", "lakera", "presidio"
mode: "logging_only"
设置以下环境变量
export PRESIDIO_ANALYZER_API_BASE="http://localhost:5002"
export PRESIDIO_ANONYMIZER_API_BASE="http://localhost:5001"
- 启动代理
litellm --config /path/to/config.yaml
- 测试它!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hi, my name is Jane!"
}
]
}'
预期的记录响应
Hi, my name is <PERSON>!