speechbrain.inference.diarization 模块

指定用于说话人分离模块的推理接口。

Authors:

阿库·罗赫 2021
彼得·普兰廷加 2021
洛伦·卢戈斯奇 2020
Mirco Ravanelli 2020
Titouan Parcollet 2021
阿卜杜勒·赫巴 2021
安德烈亚斯·诺茨 2022, 2023
Pooneh Mousavi 2023
Sylvain de Langen 2023
阿德尔·穆门 2023
普拉迪亚·坎达尔卡 2023

摘要

类：

Speech_Emotion_Diarization

一个即用型SED接口（音频 -> 情感及其持续时间）

参考

class speechbrain.inference.diarization.Speech_Emotion_Diarization(modules=None, hparams=None, run_opts=None, freeze_params=True)[source]

基础类：Pretrained

一个即用型SED接口（音频 -> 情绪及其持续时间）

Parameters:: 预训练 (参见)

Example

>>> from speechbrain.inference.diarization import Speech_Emotion_Diarization
>>> tmpdir = getfixture("tmpdir")
>>> sed_model = Speech_Emotion_Diarization.from_hparams(source="speechbrain/emotion-diarization-wavlm-large", savedir=tmpdir,) 
>>> sed_model.diarize_file("speechbrain/emotion-diarization-wavlm-large/example.wav") 

MODULES_NEEDED = ['input_norm', 'wav2vec', 'output_mlp']

diarize_file(path)[source]

获取口语表达的情感分段。

Parameters:: path (str) – 音频文件的路径，用于进行说话人分离。
Returns:: 字典列表 – 情感及其时间边界。
Return type:: 列表[字典[列表]]

encode_batch(wavs, wav_lens)[source]

将音频编码为细粒度的情感嵌入

Parameters:

wavs (torch.Tensor) – 波形批次 [batch, time, channels]。
wav_lens (torch.Tensor) – 波形相对于批次中最长波形的长度，形状为 [batch] 的张量。最长的波形应具有相对长度 1.0，其他波形的长度为 len(waveform) / max_length。用于忽略填充部分。

Returns:

编码的批次

Return type:

torch.Tensor

diarize_batch(wavs, wav_lens, batch_id)[source]

获取一批波形的情感分段。

波形应该已经是模型所需的格式。你可以调用： normalized = EncoderDecoderASR.normalizer(signal, sample_rate) 在大多数情况下获取正确转换的信号。

Parameters:

wavs (torch.Tensor) – 波形批次 [batch, time, channels].
wav_lens (torch.Tensor) – 波形相对于批次中最长波形的长度，形状为 [batch] 的张量。最长的波形应具有相对长度 1.0，其他波形的长度为 len(waveform) / max_length。用于忽略填充部分。
batch_id (torch.Tensor) – 每个批次的ID（文件名等）

Returns:

字典列表 – 情感及其时间边界。

Return type:

列表[字典[列表]]

preds_to_diarization(prediction, batch_id)[source]

将逐帧预测转换为说话人分离结果的字典。

Parameters:

prediction (torch.Tensor) – 帧级预测
batch_id (str) – 此批次的ID

Returns:

一个包含每种情绪开始/结束的字典

Return type:

字典

forward(wavs, wav_lens, batch_id)[source]: 获取一批波形的情感分段。

is_overlapped(end1, start2)[source]

如果段重叠，则返回True。

Parameters:

end1 (float) – 第一个片段的结束时间。
start2 (float) – 第二段的开始时间。

Returns:

overlapped – 如果段重叠则为True，否则为False。

Return type:

bool

Example

>>> from speechbrain.processing import diarization as diar
>>> diar.is_overlapped(5.5, 3.4)
True
>>> diar.is_overlapped(5.5, 6.4)
False

merge_ssegs_same_emotion_adjacent(lol)[source]

如果相邻的子段具有相同的情感，则合并它们。 :param lol: 每个列表包含 [utt_id, sseg_start, sseg_end, emo_label]。 :type lol: 列表的列表

Returns:: new_lol – new_lol 包含从相同情感ID合并的相邻片段。
Return type:: list 的 list

Example

>>> from speechbrain.utils.EDER import merge_ssegs_same_emotion_adjacent
>>> lol=[['u1', 0.0, 7.0, 'a'],
... ['u1', 7.0, 9.0, 'a'],
... ['u1', 9.0, 11.0, 'n'],
... ['u1', 11.0, 13.0, 'n'],
... ['u1', 13.0, 15.0, 'n'],
... ['u1', 15.0, 16.0, 'a']]
>>> merge_ssegs_same_emotion_adjacent(lol)
[['u1', 0.0, 9.0, 'a'], ['u1', 9.0, 15.0, 'n'], ['u1', 15.0, 16.0, 'a']]