speechbrain.processing.multi_mic 模块

多麦克风组件。

该库包含用于多麦克风信号处理的函数。

Example

>>> import torch
>>>
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT, ISTFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import GccPhat, SrpPhat, Music
>>> from speechbrain.processing.multi_mic import DelaySum, Mvdr, Gev
>>>
>>> xs_speech = read_audio(
...    'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels]
>>> xs_noise_diff = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs_noise_diff = xs_noise_diff.unsqueeze(0)
>>> xs_noise_loc = read_audio('tests/samples/multi-mic/noise_0.70225_-0.70225_0.11704.flac')
>>> xs_noise_loc =  xs_noise_loc.unsqueeze(0)
>>> fs = 16000 # sampling rate
>>> ss = xs_speech
>>> nn_diff = 0.05 * xs_noise_diff
>>> nn_loc = 0.05 * xs_noise_loc
>>> xs_diffused_noise = ss + nn_diff
>>> xs_localized_noise = ss + nn_loc
>>> # Delay-and-Sum Beamforming with GCC-PHAT localization
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>> delaysum = DelaySum()
>>> istft = ISTFT(sample_rate=fs)
>>> Xs = stft(xs_diffused_noise)
>>> Ns = stft(nn_diff)
>>> XXs = cov(Xs)
>>> NNs = cov(Ns)
>>> tdoas = gccphat(XXs)
>>> Ys_ds = delaysum(Xs, tdoas)
>>> ys_ds = istft(Ys_ds)
>>> # Mvdr Beamforming with SRP-PHAT localization
>>> mvdr = Mvdr()
>>> mics = torch.zeros((4,3), dtype=torch.float)
>>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00])
>>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00])
>>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> srpphat = SrpPhat(mics=mics)
>>> doas = srpphat(XXs)
>>> Ys_mvdr = mvdr(Xs, NNs, doas, doa_mode=True, mics=mics, fs=fs)
>>> ys_mvdr = istft(Ys_mvdr)
>>> # Mvdr Beamforming with MUSIC localization
>>> music = Music(mics=mics)
>>> doas = music(XXs)
>>> Ys_mvdr2 = mvdr(Xs, NNs, doas, doa_mode=True, mics=mics, fs=fs)
>>> ys_mvdr2 = istft(Ys_mvdr2)
>>> # GeV Beamforming
>>> gev = Gev()
>>> Xs = stft(xs_localized_noise)
>>> Ss = stft(ss)
>>> Ns = stft(nn_loc)
>>> SSs = cov(Ss)
>>> NNs = cov(Ns)
>>> Ys_gev = gev(Xs, SSs, NNs)
>>> ys_gev = istft(Ys_gev)
Authors:
  • 威廉·阿里斯

  • 弗朗索瓦·格隆丹

摘要

类:

Covariance

计算信号的协方差矩阵。

DelaySum

通过使用TDOA和第一个通道作为参考来执行延迟和求和波束成形。

GccPhat

广义互相关与相位变换定位。

Gev

广义特征值分解 (GEV) 波束成形。

Music

多重信号分类(MUSIC)定位。

Mvdr

通过使用频域中的输入信号、其协方差矩阵和tdoas(用于计算导向矢量)来执行最小方差无失真响应(MVDR)波束成形。

SrpPhat

使用相位变换定位的导向响应功率。

函数:

doas2taus

此函数将到达方向(以米为单位的xyz坐标)转换为到达时间差(以样本为单位表示)。

sphere

此函数生成一组形成3D球体的点的笛卡尔坐标(xyz)。

steering

此函数通过使用每个通道的到达时间差(以样本为单位)和频段数量(n_fft)来计算导向向量。

tdoas2taus

此函数选择每个通道的tdoas并将它们放入张量中。

参考

class speechbrain.processing.multi_mic.Covariance(average=True)[source]

基础:Module

计算信号的协方差矩阵。

Parameters:

average (bool) – 通知模块是否应返回协方差矩阵的平均值(在时间维度上计算)。默认值为True。

Example

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT
>>> from speechbrain.processing.multi_mic import Covariance
>>>
>>> xs_speech = read_audio(
...    'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels]
>>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs_noise = xs_noise.unsqueeze(0)
>>> xs = xs_speech + 0.05 * xs_noise
>>> fs = 16000
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>>
>>> Xs = stft(xs)
>>> XXs = cov(Xs)
>>> XXs.shape
torch.Size([1, 1001, 201, 2, 10])
forward(Xs)[source]

此方法使用实用函数 _cov 来计算协方差矩阵。因此,结果具有以下格式: (batch, time_step, n_fft/2 + 1, 2, n_mics + n_pairs)。

最后一个维度的顺序对应于方阵的triu_indices。例如,如果我们有4个通道,我们得到以下顺序:(0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) 和 (3, 3)。因此,XXs[…, 0] 对应于通道 (0, 0),XXs[…, 1] 对应于通道 (0, 1)。

参数:

Xstorch.Tensor

一批频域中的音频信号。 张量必须具有以下格式: (batch, time_step, n_fft/2 + 1, 2, n_mics)

class speechbrain.processing.multi_mic.DelaySum[source]

基础:Module

使用TDOA和第一个通道作为参考执行延迟和求和波束成形。

Example

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT, ISTFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import GccPhat, DelaySum
>>>
>>> xs_speech = read_audio(
...    'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech = xs_speech. unsqueeze(0) # [batch, time, channel]
>>> xs_noise  = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels]
>>> fs = 16000
>>> xs = xs_speech + 0.05 * xs_noise
>>>
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>> delaysum = DelaySum()
>>> istft = ISTFT(sample_rate=fs)
>>>
>>> Xs = stft(xs)
>>> XXs = cov(Xs)
>>> tdoas = gccphat(XXs)
>>> Ys = delaysum(Xs, tdoas)
>>> ys = istft(Ys)
forward(Xs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]

该方法通过使用TDOAs/DOAs计算一个导向矢量,然后调用实用函数_delaysum来执行波束成形。结果具有以下格式:(batch, time_step, n_fft, 2, 1)。

Parameters:
  • Xs (torch.Tensor) – 一批频域中的音频信号。 张量必须具有以下格式: (batch, time_step, n_fft/2 + 1, 2, n_mics)

  • localization_tensor (torch.Tensor) – 一个包含每个时间戳的到达时间差(TDOAs)(以样本为单位)或到达方向(DOAs)(以米为单位的xyz坐标)的张量。如果localization_tensor表示TDOAs,则其格式为(batch, time_steps, n_mics + n_pairs)。如果localization_tensor表示DOAs,则其格式为(batch, time_steps, 3)

  • doa_mode (bool) – 如果localization_tensor表示的是DOA而不是TDOA,用户需要将此参数设置为True。其默认值设置为False。

  • mics (torch.Tensor) – 每个麦克风的笛卡尔位置(以米为单位的xyz坐标)。 张量必须具有以下格式 (n_mics, 3)。此 参数仅在 localization_tensor 表示 DOAs 时是必需的。

  • fs (int) – 信号的采样率,单位为赫兹。此参数仅在localization_tensor表示DOA时为必填项。

  • c (float) – 介质中的声速。速度以米每秒表示,此参数的默认值为343 m/s。此参数仅在localization_tensor表示DOA时使用。

Returns:

Ys

Return type:

torch.Tensor

class speechbrain.processing.multi_mic.Mvdr(eps=1e-20)[source]

基础:Module

通过使用频域中的输入信号、其协方差矩阵和tdoas(用于计算导向矢量)来执行最小方差无失真响应(MVDR)波束形成。

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT, ISTFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import GccPhat, DelaySum
>>>
>>> xs_speech = read_audio(
...    'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel]
>>> xs_noise  = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels]
>>> fs = 16000
>>> xs = xs_speech + 0.05 * xs_noise
>>>
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>> mvdr = Mvdr()
>>> istft = ISTFT(sample_rate=fs)
>>>
>>> Xs = stft(xs)
>>> Ns = stft(xs_noise)
>>> XXs = cov(Xs)
>>> NNs = cov(Ns)
>>> tdoas = gccphat(XXs)
>>> Ys = mvdr(Xs, NNs, tdoas)
>>> ys = istft(Ys)
forward(Xs, NNs, localization_tensor, doa_mode=False, mics=None, fs=None, c=343.0)[source]

该方法在使用实用函数 _mvdr 执行波束成形之前计算一个导向向量。结果具有以下格式:(batch, time_step, n_fft, 2, 1)。

Parameters:
  • Xs (torch.Tensor) – 一批频域中的音频信号。 张量必须具有以下格式: (batch, time_step, n_fft/2 + 1, 2, n_mics)

  • NNs (torch.Tensor) – 噪声信号的协方差矩阵。张量必须具有格式 (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)

  • localization_tensor (torch.Tensor) – 一个包含每个时间戳的到达时间差(TDOAs) (以样本为单位)或到达方向(DOAs) (以米为单位的xyz坐标)的张量。如果localization_tensor表示 TDOAs,则其格式为(batch, time_steps, n_mics + n_pairs)。 如果localization_tensor表示DOAs,则其格式为 (batch, time_steps, 3)

  • doa_mode (bool) – 如果localization_tensor表示的是DOA而不是TDOA,用户需要将此参数设置为True。其默认值设置为False。

  • mics (torch.Tensor) – 每个麦克风的笛卡尔位置(以米为单位的xyz坐标)。 张量必须具有以下格式 (n_mics, 3)。此 参数仅在 localization_tensor 表示 DOAs 时是必需的。

  • fs (int) – 信号的采样率,单位为赫兹。此参数仅在localization_tensor表示DOA时为必填项。

  • c (float) – 介质中的声速。速度以米每秒表示,此参数的默认值为343 m/s。此参数仅在localization_tensor表示DOA时使用。

Returns:

Ys

Return type:

torch.Tensor

class speechbrain.processing.multi_mic.Gev[source]

基础:Module

广义特征值分解(GEV)波束成形。

Example

>>> from speechbrain.dataio.dataio import read_audio
>>> import torch
>>>
>>> from speechbrain.processing.features import STFT, ISTFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import Gev
>>>
>>> xs_speech = read_audio(
...    'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech  = xs_speech.unsqueeze(0) # [batch, time, channels]
>>> xs_noise = read_audio('tests/samples/multi-mic/noise_0.70225_-0.70225_0.11704.flac')
>>> xs_noise = xs_noise.unsqueeze(0)
>>> fs = 16000
>>> ss = xs_speech
>>> nn = 0.05 * xs_noise
>>> xs = ss + nn
>>>
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gev = Gev()
>>> istft = ISTFT(sample_rate=fs)
>>>
>>> Ss = stft(ss)
>>> Nn = stft(nn)
>>> Xs = stft(xs)
>>>
>>> SSs = cov(Ss)
>>> NNs = cov(Nn)
>>>
>>> Ys = gev(Xs, SSs, NNs)
>>> ys = istft(Ys)
forward(Xs, SSs, NNs)[source]

此方法使用实用函数 _gev 来执行广义特征值分解波束成形。因此,结果具有以下格式:(batch, time_step, n_fft, 2, 1)。

Parameters:
  • Xs (torch.Tensor) – 一批频域中的音频信号。 张量必须具有以下格式: (batch, time_step, n_fft/2 + 1, 2, n_mics)。

  • SSs (torch.Tensor) – 目标信号的协方差矩阵。张量必须具有格式 (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。

  • NNs (torch.Tensor) – 噪声信号的协方差矩阵。张量必须具有格式 (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。

Returns:

Ys

Return type:

torch.Tensor

class speechbrain.processing.multi_mic.GccPhat(tdoa_max=None, eps=1e-20)[source]

基础:Module

广义互相关与相位变换定位。

Parameters:
  • tdoa_max (int) – 指定搜索延迟的范围。例如,如果 tdoa_max = 10,该方法将限制其搜索延迟在 -10 到 10 个样本之间。此参数是可选的,其 默认值为 None。当 tdoa_max 为 None 时,该方法将 在 -n_fft/2 和 n_fft/2 之间搜索延迟(全范围)。

  • eps (float) – 一个小的值,用于避免相位变换时的除以0错误。 默认值为1e-20。

Example

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT, ISTFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import GccPhat, DelaySum
>>>
>>> xs_speech = read_audio(
...    'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channel]
>>> xs_noise  = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs_noise = xs_noise.unsqueeze(0) #[batch, time, channels]
>>> fs = 16000
>>> xs = xs_speech + 0.05 * xs_noise
>>>
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>> Xs = stft(xs)
>>> XXs = cov(Xs)
>>> tdoas = gccphat(XXs)
forward(XXs)[source]

使用实用函数 _gcc_phat 进行广义互相关与相位变换定位,并在执行二次插值以提高准确性之前提取延迟(以样本为单位)。 结果格式为:(batch, time_steps, n_mics + n_pairs)。

最后一个维度上的顺序对应于方阵的triu_indices。例如,如果我们有4个通道,我们得到以下顺序:(0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3) 和 (3, 3)。因此,delays[…, 0] 对应于通道 (0, 0),而 delays[…, 1] 对应于通道 (0, 1)。

参数:

XXstorch.Tensor

输入信号的协方差矩阵。张量必须具有格式 (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。

class speechbrain.processing.multi_mic.SrpPhat(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20)[source]

基础:Module

带有相位变换定位的导向响应功率。

Parameters:
  • mics (torch.Tensor) – 每个麦克风的笛卡尔坐标(xyz),单位为米。 张量必须具有以下格式 (n_mics, 3)。

  • space (string) – 如果此参数设置为‘sphere’,则将在三维空间中通过搜索可能的doas球体进行定位。如果设置为‘circle’,则将在二维空间中通过搜索圆形进行定位。默认情况下,此参数设置为‘sphere’。注意:‘circle’选项尚未实现。

  • sample_rate (int) – 执行SRP-PHAT的信号的采样率,单位为赫兹。 默认情况下,此参数设置为16000 Hz。

  • speed_sound (float) – 介质中的声速。速度以米每秒表示,此参数的默认值为343 m/s。

  • eps (float) – 一个小的值,用于避免像除以0这样的错误。此参数的默认值为1e-20。

Example

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import SrpPhat
>>> xs_speech = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac')
>>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> fs = 16000
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels]
>>> xs_noise = xs_noise.unsqueeze(0)
>>> ss1 = xs_speech
>>> ns1 = 0.05 * xs_noise
>>> xs1 = ss1 + ns1
>>> ss2 = xs_speech
>>> ns2 = 0.20 * xs_noise
>>> xs2 = ss2 + ns2
>>> ss = torch.cat((ss1,ss2), dim=0)
>>> ns = torch.cat((ns1,ns2), dim=0)
>>> xs = torch.cat((xs1,xs2), dim=0)
>>> mics = torch.zeros((4,3), dtype=torch.float)
>>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00])
>>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00])
>>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> srpphat = SrpPhat(mics=mics)
>>> Xs = stft(xs)
>>> XXs = cov(Xs)
>>> doas = srpphat(XXs)
forward(XXs)[source]

通过对信号计算导向向量,然后使用实用函数_srp_phat提取到达方向(DOAs)来执行SRP-PHAT定位。结果是一个包含到达方向(声源方向的xyz坐标(以米为单位))的张量。输出张量的格式为(批次,时间步长,3)。

这种定位方法使用全局一致性场(GCF): https://www.researchgate.net/publication/221491705_Speaker_localization_based_on_oriented_global_coherence_field

Parameters:

XXs (torch.Tensor) – 输入信号的协方差矩阵。张量必须具有格式 (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。

Returns:

doas

Return type:

torch.Tensor

class speechbrain.processing.multi_mic.Music(mics, space='sphere', sample_rate=16000, speed_sound=343.0, eps=1e-20, n_sig=1)[source]

基础:Module

多重信号分类(MUSIC)定位。

Parameters:
  • mics (torch.Tensor) – 每个麦克风的笛卡尔坐标(xyz),单位为米。 张量必须具有以下格式 (n_mics, 3)。

  • space (string) – 如果此参数设置为‘sphere’,则定位将在3D中进行,通过在可能的doas的球体中搜索。如果设置为‘circle’,则搜索将在2D中进行,通过在圆中搜索。默认情况下,此参数设置为‘sphere’。注意:‘circle’选项尚未实现。

  • sample_rate (int) – 执行SRP-PHAT的信号的采样率,单位为赫兹。 默认情况下,此参数设置为16000 Hz。

  • speed_sound (float) – 介质中的声速。速度以米每秒表示,此参数的默认值为343 m/s。

  • eps (float) – 一个小的值,用于避免像除以0这样的错误。此参数的默认值为1e-20。

  • n_sig (int) – 对声源数量的估计。默认值设置为一个声源。

Example

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import SrpPhat
>>> xs_speech = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac')
>>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> fs = 16000
>>> xs_speech = xs_speech.unsqueeze(0) # [batch, time, channels]
>>> xs_noise = xs_noise.unsqueeze(0)
>>> ss1 = xs_speech
>>> ns1 = 0.05 * xs_noise
>>> xs1 = ss1 + ns1
>>> ss2 = xs_speech
>>> ns2 = 0.20 * xs_noise
>>> xs2 = ss2 + ns2
>>> ss = torch.cat((ss1,ss2), dim=0)
>>> ns = torch.cat((ns1,ns2), dim=0)
>>> xs = torch.cat((xs1,xs2), dim=0)
>>> mics = torch.zeros((4,3), dtype=torch.float)
>>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00])
>>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00])
>>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> music = Music(mics=mics)
>>> Xs = stft(xs)
>>> XXs = cov(Xs)
>>> doas = music(XXs)
forward(XXs)[source]

对信号执行MUSIC定位,通过计算导向向量,然后使用实用函数_music提取到达方向。结果是一个包含到达方向(声源方向的xyz坐标(以米为单位))的张量。输出张量的格式为(批次,时间步长,3)。

Parameters:

XXs (torch.Tensor) – 输入信号的协方差矩阵。张量必须具有格式 (batch, time_steps, n_fft/2 + 1, 2, n_mics + n_pairs)。

Returns:

doas

Return type:

torch.Tensor

speechbrain.processing.multi_mic.doas2taus(doas, mics, fs, c=343.0)[source]

此函数将到达方向(以米为单位的xyz坐标)转换为到达时间差(以样本为单位)。结果具有以下格式:(batch, time_steps, n_mics)。

Parameters:
  • doas (torch.Tensor) – 到达方向以笛卡尔坐标(xyz)表示,单位为米。张量必须具有以下格式:(batch, time_steps, 3)。

  • mics (torch.Tensor) – 每个麦克风的笛卡尔位置(xyz),单位为米。 张量必须具有以下格式 (n_mics, 3)。

  • fs (int) – 信号的采样率,单位为赫兹。

  • c (float) – 介质中的声速。速度以米每秒表示,此参数的默认值为343 m/s。

Returns:

taus

Return type:

torch.Tensor

Example

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.multi_mic import sphere, doas2taus
>>> xs = read_audio('tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac')
>>> xs = xs.unsqueeze(0) # [batch, time, channels]
>>> fs = 16000
>>> mics = torch.zeros((4,3), dtype=torch.float)
>>> mics[0,:] = torch.FloatTensor([-0.05, -0.05, +0.00])
>>> mics[1,:] = torch.FloatTensor([-0.05, +0.05, +0.00])
>>> mics[2,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> mics[3,:] = torch.FloatTensor([+0.05, +0.05, +0.00])
>>> doas = sphere()
>>> taus = doas2taus(doas, mics, fs)
speechbrain.processing.multi_mic.tdoas2taus(tdoas)[source]

此函数选择每个通道的tdoas并将它们放入张量中。结果具有以下格式: (batch, time_steps, n_mics)。

Parameters:

tdoas (torch.Tensor) – 每个时间戳的到达时间差(TDOA)(以样本为单位)。张量的格式为 (batch, time_steps, n_mics + n_pairs)。

Returns:

taus

Return type:

torch.Tensor

Example

>>> import torch
>>> from speechbrain.dataio.dataio import read_audio
>>> from speechbrain.processing.features import STFT
>>> from speechbrain.processing.multi_mic import Covariance
>>> from speechbrain.processing.multi_mic import GccPhat, tdoas2taus
>>>
>>> xs_speech = read_audio(
...    'tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac'
... )
>>> xs_noise = read_audio('tests/samples/multi-mic/noise_diffuse.flac')
>>> xs = xs_speech + 0.05 * xs_noise
>>> xs = xs.unsqueeze(0)
>>> fs = 16000
>>>
>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>>
>>> Xs = stft(xs)
>>> XXs = cov(Xs)
>>> tdoas = gccphat(XXs)
>>> taus = tdoas2taus(tdoas)
speechbrain.processing.multi_mic.steering(taus, n_fft)[source]

此函数通过使用每个通道的到达时间差(以样本为单位)和频段数量(n_fft)来计算导向向量。结果具有以下格式:(batch, time_step, n_fft/2 + 1, 2, n_mics)。

参数:

taustorch.Tensor

每个通道的到达时间差。张量必须具有以下格式:(batch, time_steps, n_mics)。

n_fftint

STFT 产生的频段数量。假设 STFT 的参数“onesided”设置为 True。

示例: ——–f >>> import torch >>> from speechbrain.dataio.dataio import read_audio >>> from speechbrain.processing.features import STFT >>> from speechbrain.processing.multi_mic import Covariance >>> from speechbrain.processing.multi_mic import GccPhat, tdoas2taus, steering >>> >>> xs_speech = read_audio( … ‘tests/samples/multi-mic/speech_-0.82918_0.55279_-0.082918.flac’ … ) >>> xs_noise = read_audio(‘tests/samples/multi-mic/noise_diffuse.flac’) >>> xs = xs_speech + 0.05 * xs_noise >>> xs = xs.unsqueeze(0) # [批次, 时间, 通道] >>> fs = 16000

>>> stft = STFT(sample_rate=fs)
>>> cov = Covariance()
>>> gccphat = GccPhat()
>>>
>>> Xs = stft(xs)
>>> n_fft = Xs.shape[2]
>>> XXs = cov(Xs)
>>> tdoas = gccphat(XXs)
>>> taus = tdoas2taus(tdoas)
>>> As = steering(taus, n_fft)
speechbrain.processing.multi_mic.sphere(levels_count=4)[source]

此函数生成形成3D球体的一组点的笛卡尔坐标(xyz)。坐标以米为单位表示,并可用作doas。结果格式为:(n_points, 3)。

Parameters:

levels_count (int) –

一个与用户想要生成的点的数量成比例的数字。

  • 如果 levels_count = 1,那么球体将有 42 个点

  • 如果 levels_count = 2,那么球体将有 162 个点

  • 如果 levels_count = 3,那么球体将有 642 个点

  • 如果 levels_count = 4,那么球体将有 2562 个点

  • 如果 levels_count = 5,那么球体将有 10242 个点

默认情况下,levels_count 设置为 4。

Returns:

pts – 球体中xyz点的列表。

Return type:

torch.Tensor

Example

>>> import torch
>>> from speechbrain.processing.multi_mic import sphere
>>> doas = sphere()