本笔记本提供了优化Whisper转录质量的指南。我们将通过修剪和分段来精简您的音频数据,从而提升Whisper的转录质量。转录完成后,我们将通过添加标点符号、调整产品术语(例如将"five two nine"改为"529")以及解决Unicode问题来优化输出结果。这些策略有助于提高转录文本的清晰度,但请注意,根据您的具体使用场景进行定制化调整可能会更有效。
本文已归档,可能包含过时信息。文中所述方法、模型或技术为撰写时的最新内容,但可能已不再代表最佳实践或最新发展。
设置
首先让我们导入几个不同的库:
-
PyDub 是一个简单易用的Python音频处理库,可用于音频切片、拼接和导出等任务。
-
IPython.display模块中的Audio类允许你创建一个音频控件,可以在Jupyter笔记本中播放声音,为直接在笔记本中播放音频数据提供了一种简单的方法。 -
对于我们的音频文件,我们将使用由ChatGPT编写并由作者朗读的虚构财报电话会议内容。这个音频文件相对较短,但希望能为您提供一个示例,展示如何将这些预处理和后处理步骤应用于任何音频文件。
from openai import OpenAI
import os
import urllib
from IPython.display import Audio
from pathlib import Path
from pydub import AudioSegment
import sslclient = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))# set download paths
earnings_call_remote_filepath = "https://cdn.openai.com/API/examples/data/EarningsCall.wav"
# set local save locations
earnings_call_filepath = "data/EarningsCall.wav"
# download example audio files and save locally
ssl._create_default_https_context = ssl._create_unverified_context
urllib.request.urlretrieve(earnings_call_remote_filepath, earnings_call_filepath)('data/EarningsCall.wav', <http.client.HTTPMessage at 0x11be41f50>)有时,开头有长时间静音的文件可能导致Whisper错误地转录音频。我们将使用Pydub来检测并修剪静音部分。
这里,我们将分贝阈值设置为20。您可以根据需要更改此值。
# Function to detect leading silence
# Returns the number of milliseconds until the first sound (chunk averaging more than X decibels)
def milliseconds_until_sound(sound, silence_threshold_in_decibels=-20.0, chunk_size=10):
trim_ms = 0 # ms
assert chunk_size > 0 # to avoid infinite loop
while sound[trim_ms:trim_ms+chunk_size].dBFS < silence_threshold_in_decibels and trim_ms < len(sound):
trim_ms += chunk_size
return trim_ms
def trim_start(filepath):
path = Path(filepath)
directory = path.parent
filename = path.name
audio = AudioSegment.from_file(filepath, format="wav")
start_trim = milliseconds_until_sound(audio)
trimmed = audio[start_trim:]
new_filename = directory / f"trimmed_{filename}"
trimmed.export(new_filename, format="wav")
return trimmed, new_filename
def transcribe_audio(file,output_dir):
audio_path = os.path.join(output_dir, file)
with open(audio_path, 'rb') as audio_data:
transcription = client.audio.transcriptions.create(
model="whisper-1", file=audio_data)
return transcription.text有时我们会在转录文本中看到Unicode字符注入问题,移除所有非ASCII字符应有助于缓解此问题。
请注意,如果您要转录希腊语、西里尔字母、阿拉伯语、中文等语言,不应使用此函数
# Define function to remove non-ascii characters
def remove_non_ascii(text):
return ''.join(i for i in text if ord(i)<128)
该函数将为我们的转录文本添加格式和标点符号。Whisper生成的转录文本包含标点符号但不含格式。
# Define function to add punctuation
def punctuation_assistant(ascii_transcript):
system_prompt = """You are a helpful assistant that adds punctuation to text.
Preserve the original words and only insert necessary punctuation such as periods,
commas, capialization, symbols like dollar sings or percentage signs, and formatting.
Use only the context provided. If there is no context provided say, 'No context provided'\n"""
response = client.chat.completions.create(
model="gpt-3.5-turbo",
temperature=0,
messages=[
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": ascii_transcript
}
]
)
return response
我们的音频文件是一段模拟财报电话会议的录音,其中包含大量金融产品名称。该功能可确保如果Whisper将这些金融产品名称转录错误时,能够进行纠正。
# Define function to fix product mispellings
def product_assistant(ascii_transcript):
system_prompt = """You are an intelligent assistant specializing in financial products;
your task is to process transcripts of earnings calls, ensuring that all references to
financial products and common financial terms are in the correct format. For each
financial product or common term that is typically abbreviated as an acronym, the full term
should be spelled out followed by the acronym in parentheses. For example, '401k' should be
transformed to '401(k) retirement savings plan', 'HSA' should be transformed to 'Health Savings Account (HSA)'
, 'ROA' should be transformed to 'Return on Assets (ROA)', 'VaR' should be transformed to 'Value at Risk (VaR)'
, and 'PB' should be transformed to 'Price to Book (PB) ratio'. Similarly, transform spoken numbers representing
financial products into their numeric representations, followed by the full name of the product in parentheses.
For instance, 'five two nine' to '529 (Education Savings Plan)' and 'four zero one k' to '401(k) (Retirement Savings Plan)'.
However, be aware that some acronyms can have different meanings based on the context (e.g., 'LTV' can stand for
'Loan to Value' or 'Lifetime Value'). You will need to discern from the context which term is being referred to
and apply the appropriate transformation. In cases where numerical figures or metrics are spelled out but do not
represent specific financial products (like 'twenty three percent'), these should be left as is. Your role is to
analyze and adjust financial product terminology in the text. Once you've done that, produce the adjusted
transcript and a list of the words you've changed"""
response = client.chat.completions.create(
model="gpt-4",
temperature=0,
messages=[
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": ascii_transcript
}
]
)
return response
此函数将创建一个新文件,文件名在原文件名后附加'trimmed'
# Trim the start of the original audio file
trimmed_audio = trim_start(earnings_call_filepath)
trimmed_audio, trimmed_filename = trim_start(earnings_call_filepath)
我们的虚假财报音频文件时长较短,因此我们会相应调整分段。请注意,您可以根据需要调整分段长度。
# Segment audio
trimmed_audio = AudioSegment.from_wav(trimmed_filename) # Load the trimmed audio file
one_minute = 1 * 60 * 1000 # Duration for each segment (in milliseconds)
start_time = 0 # Start time for the first segment
i = 0 # Index for naming the segmented files
output_dir_trimmed = "trimmed_earnings_directory" # Output directory for the segmented files
if not os.path.isdir(output_dir_trimmed): # Create the output directory if it does not exist
os.makedirs(output_dir_trimmed)
while start_time < len(trimmed_audio): # Loop over the trimmed audio file
segment = trimmed_audio[start_time:start_time + one_minute] # Extract a segment
segment.export(os.path.join(output_dir_trimmed, f"trimmed_{i:02d}.wav"), format="wav") # Save the segment
start_time += one_minute # Update the start time for the next segment
i += 1 # Increment the index for naming the next file
# Get list of trimmed and segmented audio files and sort them numerically
audio_files = sorted(
(f for f in os.listdir(output_dir_trimmed) if f.endswith(".wav")),
key=lambda f: int(''.join(filter(str.isdigit, f)))
)
# Use a loop to apply the transcribe function to all audio files
transcriptions = [transcribe_audio(file, output_dir_trimmed) for file in audio_files]
# Concatenate the transcriptions
full_transcript = ' '.join(transcriptions)print(full_transcript)Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call. I'm John Doe, CEO of FinTech Plus. We've had a stellar Q2 with a revenue of 125 million, a 25% increase year over year. Our gross profit margin stands at a solid 58%, due in part to cost efficiencies gained from our scalable business model. Our EBITDA has surged to 37.5 million, translating to a remarkable 30% EBITDA margin. Our net income for the quarter rose to 16 million, which is a noteworthy increase from 10 million in Q2 2022. Our total addressable market has grown substantially thanks to the expansion of our high yield savings product line and the new RoboAdvisor platform. We've been diversifying our asset-backed securities portfolio, investing heavily in collateralized. debt obligations, and residential mortgage-backed securities. We've also invested $25 million in AAA rated corporate bonds, enhancing our risk adjusted returns. As for our balance sheet, total assets reached $1.5 billion with total liabilities at $900 million, leaving us with a solid equity base of $600 million. Our debt-to-equity ratio stands at 1.5, a healthy figure considering our expansionary phase. We continue to see substantial organic user growth, with customer acquisition cost dropping by 15% and lifetime value growing by 25%. Our LTVCAC ratio is at an impressive 3.5%. In terms of risk management, we have a value-at-risk model in place with a 99%... confidence level indicating that our maximum loss will not exceed 5 million in the next trading day. We've adopted a conservative approach to managing our leverage and have a healthy tier one capital ratio of 12.5%. Our forecast for the coming quarter is positive. We expect revenue to be around 135 million and 8% quarter over quarter growth driven primarily by our cutting edge blockchain solutions and AI driven predictive analytics. We're also excited about the upcoming IPO of our FinTech subsidiary Pay Plus, which we expect to raise 200 million, significantly bolstering our liquidity and paving the way for aggressive growth strategies. We thank our shareholders for their continued faith in us and we look forward to an even more successful Q3. Thank you so much.
# Remove non-ascii characters from the transcript
ascii_transcript = remove_non_ascii(full_transcript)print(ascii_transcript)Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call. I'm John Doe, CEO of FinTech Plus. We've had a stellar Q2 with a revenue of 125 million, a 25% increase year over year. Our gross profit margin stands at a solid 58%, due in part to cost efficiencies gained from our scalable business model. Our EBITDA has surged to 37.5 million, translating to a remarkable 30% EBITDA margin. Our net income for the quarter rose to 16 million, which is a noteworthy increase from 10 million in Q2 2022. Our total addressable market has grown substantially thanks to the expansion of our high yield savings product line and the new RoboAdvisor platform. We've been diversifying our asset-backed securities portfolio, investing heavily in collateralized. debt obligations, and residential mortgage-backed securities. We've also invested $25 million in AAA rated corporate bonds, enhancing our risk adjusted returns. As for our balance sheet, total assets reached $1.5 billion with total liabilities at $900 million, leaving us with a solid equity base of $600 million. Our debt-to-equity ratio stands at 1.5, a healthy figure considering our expansionary phase. We continue to see substantial organic user growth, with customer acquisition cost dropping by 15% and lifetime value growing by 25%. Our LTVCAC ratio is at an impressive 3.5%. In terms of risk management, we have a value-at-risk model in place with a 99%... confidence level indicating that our maximum loss will not exceed 5 million in the next trading day. We've adopted a conservative approach to managing our leverage and have a healthy tier one capital ratio of 12.5%. Our forecast for the coming quarter is positive. We expect revenue to be around 135 million and 8% quarter over quarter growth driven primarily by our cutting edge blockchain solutions and AI driven predictive analytics. We're also excited about the upcoming IPO of our FinTech subsidiary Pay Plus, which we expect to raise 200 million, significantly bolstering our liquidity and paving the way for aggressive growth strategies. We thank our shareholders for their continued faith in us and we look forward to an even more successful Q3. Thank you so much.
# Use punctuation assistant function
response = punctuation_assistant(ascii_transcript)# Extract the punctuated transcript from the model's response
punctuated_transcript = response.choices[0].message.content
print(punctuated_transcript)Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call. I'm John Doe, CEO of FinTech Plus. We've had a stellar Q2 with a revenue of $125 million, a 25% increase year over year. Our gross profit margin stands at a solid 58%, due in part to cost efficiencies gained from our scalable business model. Our EBITDA has surged to $37.5 million, translating to a remarkable 30% EBITDA margin. Our net income for the quarter rose to $16 million, which is a noteworthy increase from $10 million in Q2 2022. Our total addressable market has grown substantially thanks to the expansion of our high yield savings product line and the new RoboAdvisor platform. We've been diversifying our asset-backed securities portfolio, investing heavily in collateralized debt obligations, and residential mortgage-backed securities. We've also invested $25 million in AAA rated corporate bonds, enhancing our risk-adjusted returns. As for our balance sheet, total assets reached $1.5 billion with total liabilities at $900 million, leaving us with a solid equity base of $600 million. Our debt-to-equity ratio stands at 1.5, a healthy figure considering our expansionary phase. We continue to see substantial organic user growth, with customer acquisition cost dropping by 15% and lifetime value growing by 25%. Our LTVCAC ratio is at an impressive 3.5%. In terms of risk management, we have a value-at-risk model in place with a 99% confidence level indicating that our maximum loss will not exceed $5 million in the next trading day. We've adopted a conservative approach to managing our leverage and have a healthy tier one capital ratio of 12.5%. Our forecast for the coming quarter is positive. We expect revenue to be around $135 million and 8% quarter over quarter growth driven primarily by our cutting-edge blockchain solutions and AI-driven predictive analytics. We're also excited about the upcoming IPO of our FinTech subsidiary Pay Plus, which we expect to raise $200 million, significantly bolstering our liquidity and paving the way for aggressive growth strategies. We thank our shareholders for their continued faith in us and we look forward to an even more successful Q3. Thank you so much.
# Use product assistant function
response = product_assistant(punctuated_transcript)
# Extract the final transcript from the model's response
final_transcript = response.choices[0].message.contentprint(final_transcript)Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call. I'm John Doe, CEO of FinTech Plus. We've had a stellar second quarter (Q2) with a revenue of $125 million, a 25% increase year over year. Our gross profit margin stands at a solid 58%, due in part to cost efficiencies gained from our scalable business model. Our Earnings Before Interest, Taxes, Depreciation, and Amortization (EBITDA) has surged to $37.5 million, translating to a remarkable 30% EBITDA margin. Our net income for the quarter rose to $16 million, which is a noteworthy increase from $10 million in second quarter (Q2) 2022. Our total addressable market has grown substantially thanks to the expansion of our high yield savings product line and the new RoboAdvisor platform. We've been diversifying our asset-backed securities portfolio, investing heavily in Collateralized Debt Obligations (CDOs), and Residential Mortgage-Backed Securities (RMBS). We've also invested $25 million in AAA rated corporate bonds, enhancing our risk-adjusted returns. As for our balance sheet, total assets reached $1.5 billion with total liabilities at $900 million, leaving us with a solid equity base of $600 million. Our Debt-to-Equity (D/E) ratio stands at 1.5, a healthy figure considering our expansionary phase. We continue to see substantial organic user growth, with Customer Acquisition Cost (CAC) dropping by 15% and Lifetime Value (LTV) growing by 25%. Our LTV to CAC (LTVCAC) ratio is at an impressive 3.5%. In terms of risk management, we have a Value at Risk (VaR) model in place with a 99% confidence level indicating that our maximum loss will not exceed $5 million in the next trading day. We've adopted a conservative approach to managing our leverage and have a healthy Tier 1 Capital ratio of 12.5%. Our forecast for the coming quarter is positive. We expect revenue to be around $135 million and 8% quarter over quarter growth driven primarily by our cutting-edge blockchain solutions and AI-driven predictive analytics. We're also excited about the upcoming Initial Public Offering (IPO) of our FinTech subsidiary Pay Plus, which we expect to raise $200 million, significantly bolstering our liquidity and paving the way for aggressive growth strategies. We thank our shareholders for their continued faith in us and we look forward to an even more successful third quarter (Q3). Thank you so much. Words Changed: 1. Q2 -> second quarter (Q2) 2. EBITDA -> Earnings Before Interest, Taxes, Depreciation, and Amortization (EBITDA) 3. Q2 2022 -> second quarter (Q2) 2022 4. CDOs -> Collateralized Debt Obligations (CDOs) 5. RMBS -> Residential Mortgage-Backed Securities (RMBS) 6. D/E -> Debt-to-Equity (D/E) 7. CAC -> Customer Acquisition Cost (CAC) 8. LTV -> Lifetime Value (LTV) 9. LTVCAC -> LTV to CAC (LTVCAC) 10. VaR -> Value at Risk (VaR) 11. IPO -> Initial Public Offering (IPO) 12. Q3 -> third quarter (Q3)