1) 使用 PythonInterpreter¶
ProgramOfThought
集成了一个适配的Python解释器,用于执行由LMs生成的代码。
作为一个简要示例来展示解释器的工作原理,我们将创建一个dspy.PythonInterpreter
实例,并演示ProgramOfThought
的底层执行过程。
import dspy
interpreter = dspy.PythonInterpreter()
expr = "value = 2*5 + 4\nvalue"
answer = interpreter.execute(expr)
answer
14
2) 演示程序化思维¶
例如,我们将定义一个包含输入问题和输出答案的签名。然后,我们将创建并调用ProgramOfThought
程序,该程序使用语言模型首先生成代码来表示问题,使用解释器执行代码,并将最终结果输出作为问题的答案。
让我们使用Meta的Llama-3-70b-Instruct
。你可以轻松地将其替换为其他提供商或本地模型。
llama31_70b = dspy.LM("openai/meta-llama/Meta-Llama-3-70b-Instruct", api_base="API_BASE", api_key="None")
dspy.settings.configure(lm=llama31_70b)
现在让我们用一个简短的签名来定义我们的模块,该签名指定了输入问题和输出答案。然后我们可以在签名上调用ProgramOfThought
并传入我们的样本问题。
class BasicGenerateAnswer(dspy.Signature):
question = dspy.InputField()
answer = dspy.OutputField()
pot = dspy.ProgramOfThought(BasicGenerateAnswer)
problem = "2*5 + 4"
pot(question=problem).answer
'14'
太好了!该模块成功生出了相同的正确答案。让我们看看它是如何具体使用LM来做到这一点的:
dspy.inspect_history()
[2025-01-06T21:58:40.879405] System message: Your input fields are: 1. `question` (str) 2. `final_generated_code` (str): python code that answers the question 3. `code_output` (str): output of previously-generated python code Your output fields are: 1. `reasoning` (str) 2. `answer` (str) All interactions will be structured in the following way, with the appropriate values filled in. [[ ## question ## ]] {question} [[ ## final_generated_code ## ]] {final_generated_code} [[ ## code_output ## ]] {code_output} [[ ## reasoning ## ]] {reasoning} [[ ## answer ## ]] {answer} [[ ## completed ## ]] In adhering to this structure, your objective is: Given the final code `question`, `final_generated_code`, `code_output`, provide the final `answer`. User message: [[ ## question ## ]] 2*5 + 4 [[ ## final_generated_code ## ]] def calculate_expression(): # Multiply 2 and 5 multiplication_result = 2 * 5 # Add 4 to the result final_result = multiplication_result + 4 return final_result # Execute the function to get the final answer answer = calculate_expression() print(answer) [[ ## code_output ## ]] 14 Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## reasoning ## ]] The given code defines a function `calculate_expression` that calculates the result of the expression 2*5 + 4. It first multiplies 2 and 5, then adds 4 to the result. The function is then executed, and the result is printed. [[ ## answer ## ]] 14 [[ ## completed ## ]]
我们看到生成的Python代码定义了一个用于中间计算的函数,并通过PythonInterpreter
执行后返回最终答案,得到了正确答案。
3) 与ChainOfThought对比¶
现在我们转向一个更复杂的问题,来展示ProgramOfThought
模块如何发挥作用。
问题: 计算12! / 1到30之间的质数之和。
这是一个相当具有挑战性的计算。让我们先看看ChainOfThought
的表现:
problem = "Compute 12! / sum of prime numbers between 1 and 30."
cot = dspy.ChainOfThought(BasicGenerateAnswer)
cot(question=problem).answer
'3,710,009'
dspy.inspect_history()
[2025-01-06T21:59:08.539739] System message: Your input fields are: 1. `question` (str) Your output fields are: 1. `reasoning` (str) 2. `answer` (str) All interactions will be structured in the following way, with the appropriate values filled in. [[ ## question ## ]] {question} [[ ## reasoning ## ]] {reasoning} [[ ## answer ## ]] {answer} [[ ## completed ## ]] In adhering to this structure, your objective is: Given the fields `question`, produce the fields `answer`. User message: [[ ## question ## ]] Compute 12! / sum of prime numbers between 1 and 30. Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## reasoning ## ]] To solve this problem, we need to calculate 12! (12 factorial) and the sum of prime numbers between 1 and 30. First, let's calculate 12!. 12! = 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 = 479,001,600. Next, let's find the prime numbers between 1 and 30. The prime numbers between 1 and 30 are 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29. Now, let's calculate the sum of these prime numbers. sum = 2 + 3 + 5 + 7 + 11 + 13 + 17 + 19 + 23 + 29 = 129. Finally, let's calculate 12! / sum of prime numbers between 1 and 30. result = 479,001,600 / 129 = 3,710,009. [[ ## answer ## ]] 3,710,009 [[ ## completed ## ]]
因此,ChainOfThought
在逐步推理方面表现相当不错,正确计算出了12!的值以及1到30之间仅素数的和。
但在除法的最后一步失败了,错误地计算了479,001,600 / 129 = 3,710,009
,而正确答案是3713190.69767
(经真实计算器验证!)
让我们看看ProgramOfThought
的表现如何:
pot(question=problem).answer
'3713190.697674419'
dspy.inspect_history()
[2025-01-06T21:59:13.140776] System message: Your input fields are: 1. `question` (str) 2. `final_generated_code` (str): python code that answers the question 3. `code_output` (str): output of previously-generated python code Your output fields are: 1. `reasoning` (str) 2. `answer` (str) All interactions will be structured in the following way, with the appropriate values filled in. [[ ## question ## ]] {question} [[ ## final_generated_code ## ]] {final_generated_code} [[ ## code_output ## ]] {code_output} [[ ## reasoning ## ]] {reasoning} [[ ## answer ## ]] {answer} [[ ## completed ## ]] In adhering to this structure, your objective is: Given the final code `question`, `final_generated_code`, `code_output`, provide the final `answer`. User message: [[ ## question ## ]] Compute 12! / sum of prime numbers between 1 and 30. [[ ## final_generated_code ## ]] def is_prime(n): """Check if a number is prime.""" if n < 2: return False for i in range(2, int(n**0.5) + 1): if n % i == 0: return False return True def sum_of_primes(start, end): """Calculate the sum of prime numbers in a given range.""" return sum(num for num in range(start, end + 1) if is_prime(num)) def factorial(n): """Calculate the factorial of a number.""" result = 1 for i in range(1, n + 1): result *= i return result # Calculate the factorial of 12 fact_12 = factorial(12) # Calculate the sum of prime numbers between 1 and 30 sum_primes = sum_of_primes(1, 30) # Calculate the final result result = fact_12 / sum_primes print(result) [[ ## code_output ## ]] 3713190.697674419 Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## reasoning ## ]] The problem requires us to compute the value of 12! divided by the sum of prime numbers between 1 and 30. To solve this, we first need to calculate the factorial of 12 and the sum of prime numbers between 1 and 30. We can use the formulas for factorial and prime numbers to achieve this. The factorial of a number n (denoted as n!) is the product of all positive integers less than or equal to n. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. We can use these definitions to write functions to calculate the factorial and sum of prime numbers. Finally, we can divide the factorial of 12 by the sum of prime numbers to get the final result. [[ ## answer ## ]] 3713190.697674419 [[ ## completed ## ]]
随着Python解释器准确执行代码,ProgramOfThought
减轻了在ChainOfThought
中可能失败的计算错误,特别提高了数值和逻辑查询的正确性。
3) 带上下文推理的计算¶
现在让我们尝试一个更复杂的例子,在复杂的数学应用题中进行计算。
步骤1:定义一个辅助函数来搜索维基百科¶
我们将使用一个 dspy.ColBERTv2
服务器从维基百科检索最匹配的内容,并在 ProgramOfThought
管道内解析它们。
def search_wikipedia(query: str):
results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
return [x['text'] for x in results]
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")
class GenerateSearchQuery(dspy.Signature):
"""Write a simple search query that will help answer the non-numerical components of a complex question."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
query = dspy.OutputField()
from dspy.dsp.utils import deduplicate
class MultiHopSearchWithPoT(dspy.Module):
def __init__(self, num_hops):
self.num_hops = num_hops
self.generate_query = dspy.ChainOfThought(GenerateSearchQuery)
self.generate_answer = dspy.ProgramOfThought(GenerateAnswer, max_iters=3)
def forward(self, question):
context = []
for _ in range(self.num_hops):
query = self.generate_query(context=context, question=question).query
context = deduplicate(context + search_wikipedia(query))
prediction = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=prediction.answer)
multi_hop_pot = MultiHopSearchWithPoT(num_hops=2)
question = (
"What is the square of the total sum of the atomic number of the metal "
"that makes up the gift from France to the United States in the late "
"19th century and the sum of the number of digits in the first 10 prime numbers?"
)
multi_hop_pot(question=question).answer
'2025'
dspy.inspect_history()
[2025-01-06T22:00:34.427037] System message: Your input fields are: 1. `context` (str): may contain relevant facts 2. `question` (str) 3. `final_generated_code` (str): python code that answers the question 4. `code_output` (str): output of previously-generated python code Your output fields are: 1. `reasoning` (str) 2. `answer` (str): often between 1 and 5 words All interactions will be structured in the following way, with the appropriate values filled in. [[ ## context ## ]] {context} [[ ## question ## ]] {question} [[ ## final_generated_code ## ]] {final_generated_code} [[ ## code_output ## ]] {code_output} [[ ## reasoning ## ]] {reasoning} [[ ## answer ## ]] {answer} [[ ## completed ## ]] In adhering to this structure, your objective is: Given the final code `context`, `question`, `final_generated_code`, `code_output`, provide the final `answer`. User message: [[ ## context ## ]] [1] «Goddess of Democracy | The Goddess of Democracy, also known as the Goddess of Democracy and Freedom, the Spirit of Democracy, and the Goddess of Liberty (自由女神; "zìyóu nǚshén"), was a 10-meter-tall (33 ft) statue created during the Tiananmen Square protests of 1989. The statue was constructed in only four days out of foam and papier-mâché over a metal armature. The constructors decided to make the statue as large as possible to try to dissuade the government from dismantling it: the government would either have to destroy the statue—an action which would potentially fuel further criticism of its policies—or leave it standing. Nevertheless, the statue was destroyed on June 4, 1989, by soldiers clearing the protesters from Tiananmen square. Since its destruction, numerous replicas and memorials have been erected around the world, including in Hong Kong and Washington DC.» [2] «Statue of Liberty | The Statue of Liberty (Liberty Enlightening the World; French: "La Liberté éclairant le monde" ) is a colossal neoclassical sculpture on Liberty Island in New York Harbor in New York City, in the United States. The copper statue, a gift from the people of France to the people of the United States, was designed by French sculptor Frédéric Auguste Bartholdi and built by Gustave Eiffel. The statue was dedicated on October 28, 1886.» [3] «Flame of Liberty | The Flame of Liberty ("Flamme de la Liberté") in Paris is a full-sized, gold-leaf-covered replica of the new flame at the upper end of the torch carried in the hand of the Statue of Liberty ("Liberty Enlightening the World") at the entrance to the harbor of New York City since 1886. The monument, which measures approximately 3.5 metres in height, is a sculpture of a flame, executed in gilded copper, supported by a pedestal of gray-and-black marble. It is located near the northern end of the Pont de l'Alma, on the Place de l'Alma, in the 8th arrondissement of Paris.» [4] «Copper | Copper is a chemical element with symbol Cu (from Latin: "cuprum" ) and atomic number 29. It is a soft, malleable, and ductile metal with very high thermal and electrical conductivity. A freshly exposed surface of pure copper has a reddish-orange color. Copper is used as a conductor of heat and electricity, as a building material, and as a constituent of various metal alloys, such as sterling silver used in jewelry, cupronickel used to make marine hardware and coins, and constantan used in strain gauges and thermocouples for temperature measurement.» [5] «Isotopes of copper | Copper (Cu) has two stable isotopes, Cu and Cu, along with 27 radioisotopes. The most stable of these is Cu with a half-life of 61.83 hours. The least stable is Cu with a half-life of approximately 75 ns. Most have half-lives under a minute. Unstable copper isotopes with atomic masses below 63 tend to undergo β decay, while isotopes with atomic masses above 65 tend to undergo β decay. Cu decays by both β and β.» [6] «Copper(II) arsenate | Copper arsenate (Cu(AsO).4HO, or CuH(AsO).2HO), also called copper orthoarsenate, tricopper arsenate, cupric arsenate, or tricopper orthoarsenate, is a blue or bluish-green powder insoluble in water and alcohol and soluble in aqueous ammonium and dilute acids. Its CAS number is 7778-41-8 or 10103-61-4 .» [[ ## question ## ]] What is the square of the total sum of the atomic number of the metal that makes up the gift from France to the United States in the late 19th century and the sum of the number of digits in the first 10 prime numbers? [[ ## final_generated_code ## ]] def square_of_sum(): # Define the atomic number of copper atomic_number_of_copper = 29 # Define the first 10 prime numbers first_10_prime_numbers = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] # Calculate the sum of the number of digits in the first 10 prime numbers sum_of_digits = sum(len(str(prime)) for prime in first_10_prime_numbers) # Calculate the total sum total_sum = atomic_number_of_copper + sum_of_digits # Calculate the square of the total sum square_of_total_sum = total_sum ** 2 return square_of_total_sum # Execute the function to get the final answer final_answer = square_of_sum() print(final_answer) [[ ## code_output ## ]] 2025 Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## reasoning ## ]] The problem requires us to find the square of the total sum of the atomic number of the metal that makes up the gift from France to the United States in the late 19th century and the sum of the number of digits in the first 10 prime numbers. The gift from France to the United States is the Statue of Liberty, which is made of copper. The atomic number of copper is 29. The first 10 prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29. The sum of the number of digits in these prime numbers is 1 + 1 + 1 + 1 + 2 + 2 + 2 + 2 + 2 + 2 = 14. The total sum is 29 + 14 = 43. The square of the total sum is 43^2 = 1849. However, the code output is 2025, which indicates that the code is correct and the manual calculation is incorrect. [[ ## answer ## ]] 2025 [[ ## completed ## ]]
请注意检索到的上下文包含了关于自由女神像和铜的段落。这次检索有助于回答问题的第一部分,识别出自由女神像是19世纪末法国送给美国的礼物,确定它由铜制成,并通过逐步推理检索出铜的原子序数(29)。
问题的第二部分被分解为Python逻辑,通过编程方式计算前10个质数的数字总和。
通过结合这两个子问题,该方案正确聚合了结果并输出最终答案:2025。