注意
跳转到末尾 以下载完整示例代码。
学习样本边际分布与CO-最优传输
在这个例子中,我们说明如何估计样本边际分布,最小化两个矩阵之间的CO-最优传输距离 [47]_。更准确地说,给定源数据\((X, \mu_x^{(s)}, \mu_x^{(f)})\) 和一个与特征的固定直方图\(\mu_y^{(f)}\)相关的目标矩阵\(Y\),我们想要解决以下问题
\[\min_{\mu_y^{(s)} \in \Delta} \text{COOT}\left( (X, \mu_x^{(s)}, \mu_x^{(f)}), (Y, \mu_y^{(s)}, \mu_y^{(f)}) \right)\]
其中 \(\Delta\) 是概率单纯形。这个最小化是在 PyTorch 中使用简单的投影梯度下降完成的。我们使用 POT 的自动后端,允许我们计算 CO-Optimal Transport 距离,使用 ot.coot.co_optimal_transport2() 并带有可微分的损失。
# Author: Remi Flamary <remi.flamary@unice.fr>
# Quang Huy Tran <quang-huy.tran@univ-ubs.fr>
# License: MIT License
from matplotlib.patches import ConnectionPatch
import torch
import numpy as np
import matplotlib.pyplot as pl
import ot
from ot.coot import co_optimal_transport as coot
from ot.coot import co_optimal_transport2 as coot2
生成数据
源矩阵和干净的目标矩阵通过 \(X_{i,j} = \cos(\frac{i}{n_1} \pi) + \cos(\frac{j}{d_1} \pi)\) 和 \(Y_{i,j} = \cos(\frac{i}{n_2} \pi) + \cos(\frac{j}{d_2} \pi)\) 生成。 目标矩阵随后通过添加5个行异常值被污染。 直观上,我们期望估计的样本分布应该忽略这些异常值, 即它们的权重应该是零。
np.random.seed(182)
n1, d1 = 20, 16
n2, d2 = 10, 8
n = 15
X = (
torch.cos(torch.arange(n1) * torch.pi / n1)[:, None] +
torch.cos(torch.arange(d1) * torch.pi / d1)[None, :]
)
# Generate clean target data mixed with outliers
Y_noisy = torch.randn((n, d2)) * 10.0
Y_noisy[:n2, :] = (
torch.cos(torch.arange(n2) * torch.pi / n2)[:, None] +
torch.cos(torch.arange(d2) * torch.pi / d2)[None, :]
)
Y = Y_noisy[:n2, :]
X, Y_noisy, Y = X.double(), Y_noisy.double(), Y.double()
fig, axes = pl.subplots(nrows=1, ncols=3, figsize=(12, 5))
axes[0].imshow(X, vmin=-2, vmax=2)
axes[0].set_title('$X$')
axes[1].imshow(Y, vmin=-2, vmax=2)
axes[1].set_title('Clean $Y$')
axes[2].imshow(Y_noisy, vmin=-2, vmax=2)
axes[2].set_title('Noisy $Y$')
pl.tight_layout()

优化COOT距离以适应样本边缘分布
losses = []
lr = 1e-3
niter = 1000
b = torch.tensor(ot.unif(n), requires_grad=True)
for i in range(niter):
loss = coot2(X, Y_noisy, wy_samp=b, log=False, verbose=False)
losses.append(float(loss))
loss.backward()
with torch.no_grad():
b -= lr * b.grad # gradient step
b[:] = ot.utils.proj_simplex(b) # projection on the simplex
b.grad.zero_()
# Estimated sample marginal distribution and training loss curve
pl.plot(losses[10:])
pl.title('CO-Optimal Transport distance')
print(f"Marginal distribution = {b.detach().numpy()}")

Marginal distribution = [0.07507868 0.08001347 0.09469872 0.1001999 0.10001527 0.10001687
0.09999904 0.09979829 0.11466591 0.13551386 0. 0.
0. 0. 0. ]
可视化行和列的对齐与估计的样本边际分布
显然,学习到的边际分布完全成功地忽略了这 5 个离群值。
X, Y_noisy = X.numpy(), Y_noisy.numpy()
b = b.detach().numpy()
pi_sample, pi_feature = coot(X, Y_noisy, wy_samp=b, log=False, verbose=True)
fig = pl.figure(4, (9, 7))
pl.clf()
ax1 = pl.subplot(2, 2, 3)
pl.imshow(X, vmin=-2, vmax=2)
pl.xlabel('$X$')
ax2 = pl.subplot(2, 2, 2)
ax2.yaxis.tick_right()
pl.imshow(np.transpose(Y_noisy), vmin=-2, vmax=2)
pl.title("Transpose(Noisy $Y$)")
ax2.xaxis.tick_top()
for i in range(n1):
j = np.argmax(pi_sample[i, :])
xyA = (d1 - .5, i)
xyB = (j, d2 - .5)
con = ConnectionPatch(xyA=xyA, xyB=xyB, coordsA=ax1.transData,
coordsB=ax2.transData, color="black")
fig.add_artist(con)
for i in range(d1):
j = np.argmax(pi_feature[i, :])
xyA = (i, -.5)
xyB = (-.5, j)
con = ConnectionPatch(
xyA=xyA, xyB=xyB, coordsA=ax1.transData, coordsB=ax2.transData, color="blue")
fig.add_artist(con)

CO-Optimal Transport cost at iteration 1: 0.010389716046318498
脚本的总运行时间: (0分钟 3.856秒)