2023年8月11日

解决转录拼写错误:即时提示与后处理对比

本文已归档,可能包含过时信息。文中所述方法、模型或技术为撰写时的最新内容,但可能已不再代表最佳实践或最新发展。

我们正在解决提高转录准确性的问题,尤其是涉及公司名称和产品引用时。我们的解决方案采用双重策略,同时利用Whisper提示参数和GPT-4的后处理能力。

纠正不准确性的两种方法是:

  • 我们直接将一系列正确拼写输入到Whisper的提示参数中,以指导初始转录。

  • 我们利用GPT-4在转录后修正拼写错误,同样在提示中使用了相同的正确拼写列表。

这些策略旨在确保不熟悉专有名词的准确转录。

设置

让我们开始吧:

  • 导入OpenAI Python库(如果尚未安装,需通过pip install openai命令安装)
  • 下载音频文件示例
# imports
from openai import OpenAI  # for making OpenAI API calls
import urllib  # for downloading example audio files
import os  # for accessing environment variables

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# set download paths
ZyntriQix_remote_filepath = "https://cdn.openai.com/API/examples/data/ZyntriQix.wav"


# set local save locations
ZyntriQix_filepath = "data/ZyntriQix.wav"

# download example audio files and save locally
urllib.request.urlretrieve(ZyntriQix_remote_filepath, ZyntriQix_filepath)
('data/ZyntriQix.wav', <http.client.HTTPMessage at 0x10559a910>)

通过虚构的音频录音设定基准线

我们的参考点是一段独白,它是由作者提供的提示词通过ChatGPT生成的。随后,作者为这段内容进行了配音。因此,作者既通过提示词引导了ChatGPT的输出,又通过语音演绎使其生动呈现。

我们虚构的公司ZyntriQix提供一系列科技产品。这些产品包括Digique Plus、CynapseFive、VortiQore V8、EchoNix Array、OrbitalLink Seven和DigiFractal Matrix。我们还主导了多项计划,如PULSE、RAPT、B.R.I.C.K.、Q.U.A.R.T.Z.和F.L.I.N.T.

# define a wrapper function for seeing how prompts affect transcriptions
def transcribe(prompt: str, audio_filepath) -> str:
    """Given a prompt, transcribe the audio file."""
    transcript = client.audio.transcriptions.create(
        file=open(audio_filepath, "rb"),
        model="whisper-1",
        prompt=prompt,
    )
    return transcript.text
# baseline transcription with no prompt
transcribe(prompt="", audio_filepath=ZyntriQix_filepath)
"Have you heard of ZentricX? This tech giant boasts products like Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the Pulse framework, Wrapped system, they've developed a brick infrastructure court system, and launched the Flint initiative, all highlighting their commitment to relentless innovation. ZentricX, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"

Whisper错误地转录了我们公司的名称、产品名称,并且错误地大写我们的缩写词。让我们在提示中将正确的名称作为列表传递。

# add the correct spelling names to the prompt
transcribe(
    prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
    audio_filepath=ZyntriQix_filepath,
)
"Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a B.R.I.C.K. infrastructure, Q.U.A.R.T. system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"

在传递产品名称列表时,部分产品名称被正确转录,而其他名称仍然存在拼写错误。

# add a full product list to the prompt
transcribe(
    prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
    audio_filepath=ZyntriQix_filepath,
)
"Have you heard of ZentricX? This tech giant boasts products like DigiCube Plus, Synapse 5, VortiCore V8, EchoNix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a brick infrastructure court system and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZentricX in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"

你可以使用GPT-4来修正拼写错误

当演讲内容事先未知且我们手头有一份现成的产品名称列表时,利用GPT-4就显得尤为有用。

利用GPT-4的后处理技术明显比仅依赖Whisper的提示参数更具扩展性,后者的标记限制为244个。GPT-4使我们能够处理更长的正确拼写列表,这使其成为处理大量产品列表的更强大方法。

然而,这种后处理技术并非没有局限性。它受限于所选模型的上下文窗口,在处理大量独特术语时可能会带来挑战。例如,拥有数千个SKU的公司可能会发现GPT-4的上下文窗口无法满足其需求,他们可能需要探索替代解决方案。

有趣的是,GPT-4后处理技术似乎比单独使用Whisper更可靠。这种利用产品列表的方法提高了我们结果的可靠性。然而,这种可靠性的提升是有代价的,因为使用这种方法可能会增加成本并导致更高的延迟。

# define a wrapper function for seeing how prompts affect transcriptions
def transcribe_with_spellcheck(system_message, audio_filepath):
    completion = client.chat.completions.create(
        model="gpt-4",
        temperature=0,
        messages=[
            {"role": "system", "content": system_message},
            {
                "role": "user",
                "content": transcribe(prompt="", audio_filepath=audio_filepath),
            },
        ],
    )
    return completion.choices[0].message.content

现在,让我们将原始产品列表输入GPT-4并评估其表现。通过这种方式,我们旨在评估这个AI模型正确拼写专有产品名称的能力,即使它事先并不知晓转录文本中会出现哪些具体术语。在我们的实验中,GPT-4成功正确地拼写出了我们的产品名称,这证实了它作为确保转录准确性的可靠工具的潜力。

system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?

在这种情况下,我们提供了一个全面的产品列表,其中包含所有之前使用过的拼写变体以及新增的名称。这个场景模拟了现实情况:当我们拥有大量SKU清单且不确定转录文本中会出现哪些确切术语时。将这份详尽的产品名称列表输入系统后,最终输出了正确的转录结果。

system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array,  OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?

我们正在使用GPT-4作为拼写检查工具,采用与之前提示中相同的正确拼写列表。

system_prompt = "You are a helpful assistant for the company ZyntriQix. Your first task is to list the words that are not spelled correctly according to the list provided to you and to tell me the number of misspelled words. Your next task is to insert those correct words in place of the misspelled ones. List: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array,  OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
The misspelled words are: ZentricX, Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, Orbital Link 7, Digifractal Matrix, Pulse, Wrapped, brick, Flint, and 30. The total number of misspelled words is 12.

The corrected paragraph is:

Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just MetaSync Thirty years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?