Skip to content

使用DSPy从邮件中提取信息

本教程演示如何使用DSPy构建一个智能邮件处理系统。我们将创建一个能够自动从各类邮件中提取关键信息、分类其意图,并为后续处理结构化数据的系统。

你将构建什么

在本教程结束时,你将拥有一个由DSPy驱动的邮件处理系统,能够:

  • 分类邮件类型(订单确认、支持请求、会议邀请等)
  • 提取关键实体 (日期、金额、产品名称、联系信息)
  • 确定紧急级别和所需行动
  • 结构化提取的数据为一致的格式
  • 稳健处理多种电子邮件格式

先决条件

  • 对DSPy模块和签名的基本理解
  • 已安装 Python 3.9+
  • OpenAI API密钥(或访问其他支持的LLM)

安装与设置

pip install dspy

步骤1:定义我们的数据结构

首先,让我们定义要从电子邮件中提取的信息类型:

import dspy
from typing import List, Optional, Literal
from datetime import datetime
from pydantic import BaseModel
from enum import Enum

class EmailType(str, Enum):
    ORDER_CONFIRMATION = "order_confirmation"
    SUPPORT_REQUEST = "support_request"
    MEETING_INVITATION = "meeting_invitation"
    NEWSLETTER = "newsletter"
    PROMOTIONAL = "promotional"
    INVOICE = "invoice"
    SHIPPING_NOTIFICATION = "shipping_notification"
    OTHER = "other"

class UrgencyLevel(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

class ExtractedEntity(BaseModel):
    entity_type: str
    value: str
    confidence: float

class EmailInsight(BaseModel):
    email_type: EmailType
    urgency: UrgencyLevel
    summary: str
    key_entities: list[ExtractedEntity]
    action_required: bool
    deadline: Optional[str] = None
    amount: Optional[float] = None
    sender_info: Optional[str] = None

步骤2:创建DSPY签名

现在让我们定义电子邮件处理流程的签名:

class ClassifyEmail(dspy.Signature):
    """Classify the type and urgency of an email based on its content."""

    email_subject: str = dspy.InputField(desc="The subject line of the email")
    email_body: str = dspy.InputField(desc="The main content of the email")
    sender: str = dspy.InputField(desc="Email sender information")

    email_type: EmailType = dspy.OutputField(desc="The classified type of email")
    urgency: UrgencyLevel = dspy.OutputField(desc="The urgency level of the email")
    reasoning: str = dspy.OutputField(desc="Brief explanation of the classification")

class ExtractEntities(dspy.Signature):
    """Extract key entities and information from email content."""

    email_content: str = dspy.InputField(desc="The full email content including subject and body")
    email_type: EmailType = dspy.InputField(desc="The classified type of email")

    key_entities: list[ExtractedEntity] = dspy.OutputField(desc="List of extracted entities with type, value, and confidence")
    financial_amount: Optional[float] = dspy.OutputField(desc="Any monetary amounts found (e.g., '$99.99')")
    important_dates: list[str] = dspy.OutputField(desc="List of important dates found in the email")
    contact_info: list[str] = dspy.OutputField(desc="Relevant contact information extracted")

class GenerateActionItems(dspy.Signature):
    """Determine what actions are needed based on the email content and extracted information."""

    email_type: EmailType = dspy.InputField()
    urgency: UrgencyLevel = dspy.InputField()
    email_summary: str = dspy.InputField(desc="Brief summary of the email content")
    extracted_entities: list[ExtractedEntity] = dspy.InputField(desc="Key entities found in the email")

    action_required: bool = dspy.OutputField(desc="Whether any action is required")
    action_items: list[str] = dspy.OutputField(desc="List of specific actions needed")
    deadline: Optional[str] = dspy.OutputField(desc="Deadline for action if applicable")
    priority_score: int = dspy.OutputField(desc="Priority score from 1-10")

class SummarizeEmail(dspy.Signature):
    """Create a concise summary of the email content."""

    email_subject: str = dspy.InputField()
    email_body: str = dspy.InputField()
    key_entities: list[ExtractedEntity] = dspy.InputField()

    summary: str = dspy.OutputField(desc="A 2-3 sentence summary of the email's main points")

步骤3:构建邮件处理模块

现在让我们创建主要的邮件处理模块:

class EmailProcessor(dspy.Module):
    """A comprehensive email processing system using DSPy."""

    def __init__(self):
        super().__init__()

        # Initialize our processing components
        self.classifier = dspy.ChainOfThought(ClassifyEmail)
        self.entity_extractor = dspy.ChainOfThought(ExtractEntities)
        self.action_generator = dspy.ChainOfThought(GenerateActionItems)
        self.summarizer = dspy.ChainOfThought(SummarizeEmail)

    def forward(self, email_subject: str, email_body: str, sender: str = ""):
        """Process an email and extract structured information."""

        # Step 1: Classify the email
        classification = self.classifier(
            email_subject=email_subject,
            email_body=email_body,
            sender=sender
        )

        # Step 2: Extract entities
        full_content = f"Subject: {email_subject}\n\nFrom: {sender}\n\n{email_body}"
        entities = self.entity_extractor(
            email_content=full_content,
            email_type=classification.email_type
        )

        # Step 3: Generate summary
        summary = self.summarizer(
            email_subject=email_subject,
            email_body=email_body,
            key_entities=entities.key_entities
        )

        # Step 4: Determine actions
        actions = self.action_generator(
            email_type=classification.email_type,
            urgency=classification.urgency,
            email_summary=summary.summary,
            extracted_entities=entities.key_entities
        )

        # Step 5: Structure the results
        return dspy.Prediction(
            email_type=classification.email_type,
            urgency=classification.urgency,
            summary=summary.summary,
            key_entities=entities.key_entities,
            financial_amount=entities.financial_amount,
            important_dates=entities.important_dates,
            action_required=actions.action_required,
            action_items=actions.action_items,
            deadline=actions.deadline,
            priority_score=actions.priority_score,
            reasoning=classification.reasoning,
            contact_info=entities.contact_info
        )

步骤4:运行邮件处理系统

让我们创建一个简单的函数来测试我们的邮件处理系统:

import os
def run_email_processing_demo():
    """Demonstration of the email processing system."""

    # Configure DSPy
    lm = dspy.LM(model='openai/gpt-4o-mini')
    dspy.configure(lm=lm)
    os.environ["OPENAI_API_KEY"] = "<YOUR OPENAI KEY>"

    # Create our email processor
    processor = EmailProcessor()

    # Sample emails for testing
    sample_emails = [
        {
            "subject": "Order Confirmation #12345 - Your MacBook Pro is on the way!",
            "body": """Dear John Smith,

Thank you for your order! We're excited to confirm that your order #12345 has been processed.

Order Details:
- MacBook Pro 14-inch (Space Gray)
- Order Total: $2,399.00
- Estimated Delivery: December 15, 2024
- Tracking Number: 1Z999AA1234567890

If you have any questions, please contact our support team at support@techstore.com.

Best regards,
TechStore Team""",
            "sender": "orders@techstore.com"
        },
        {
            "subject": "URGENT: Server Outage - Immediate Action Required",
            "body": """Hi DevOps Team,

We're experiencing a critical server outage affecting our production environment.

Impact: All users unable to access the platform
Started: 2:30 PM EST

Please join the emergency call immediately: +1-555-123-4567

This is our highest priority.

Thanks,
Site Reliability Team""",
            "sender": "alerts@company.com"
        },
        {
            "subject": "Meeting Invitation: Q4 Planning Session",
            "body": """Hello team,

You're invited to our Q4 planning session.

When: Friday, December 20, 2024 at 2:00 PM - 4:00 PM EST
Where: Conference Room A

Please confirm your attendance by December 18th.

Best,
Sarah Johnson""",
            "sender": "sarah.johnson@company.com"
        }
    ]

    # Process each email and display results
    print("🚀 Email Processing Demo")
    print("=" * 50)

    for i, email in enumerate(sample_emails):
        print(f"\n📧 EMAIL {i+1}: {email['subject'][:50]}...")

        # Process the email
        result = processor(
            email_subject=email["subject"],
            email_body=email["body"],
            sender=email["sender"]
        )

        # Display key results
        print(f"   📊 Type: {result.email_type}")
        print(f"   🚨 Urgency: {result.urgency}")
        print(f"   📝 Summary: {result.summary}")

        if result.financial_amount:
            print(f"   💰 Amount: ${result.financial_amount:,.2f}")

        if result.action_required:
            print(f"   ✅ Action Required: Yes")
            if result.deadline:
                print(f"   ⏰ Deadline: {result.deadline}")
        else:
            print(f"   ✅ Action Required: No")

# Run the demo
if __name__ == "__main__":
    run_email_processing_demo()

预期输出

🚀 Email Processing Demo
==================================================

📧 EMAIL 1: Order Confirmation #12345 - Your MacBook Pro is on...
   📊 Type: order_confirmation
   🚨 Urgency: low
   📝 Summary: The email confirms John Smith's order #12345 for a MacBook Pro 14-inch in Space Gray, totaling $2,399.00, with an estimated delivery date of December 15, 2024. It includes a tracking number and contact information for customer support.
   💰 Amount: $2,399.00
   ✅ Action Required: No

📧 EMAIL 2: URGENT: Server Outage - Immediate Action Required...
   📊 Type: other
   🚨 Urgency: critical
   📝 Summary: The Site Reliability Team has reported a critical server outage that began at 2:30 PM EST, preventing all users from accessing the platform. They have requested the DevOps Team to join an emergency call immediately to address the issue.
   ✅ Action Required: Yes
   ⏰ Deadline: Immediately

📧 EMAIL 3: Meeting Invitation: Q4 Planning Session...
   📊 Type: meeting_invitation
   🚨 Urgency: medium
   📝 Summary: Sarah Johnson has invited the team to a Q4 planning session on December 20, 2024, from 2:00 PM to 4:00 PM EST in Conference Room A. Attendees are asked to confirm their participation by December 18th.
   ✅ Action Required: Yes
   ⏰ Deadline: December 18th

后续步骤

  • 添加更多邮件类型并优化分类(新闻通讯、促销邮件等)
  • 添加集成 与邮件提供商(Gmail API、Outlook、IMAP)
  • 尝试不同的LLM和优化策略
  • 添加多语言支持用于国际邮件处理
  • 优化 用于提升程序性能
优云智算