.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/applications/plot_digits_denoising.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_applications_plot_digits_denoising.py: ================================ 使用核PCA进行图像去噪 ================================ 本示例展示了如何使用 :class:`~sklearn.decomposition.KernelPCA` 来对图像进行去噪。简而言之,我们利用在 `fit` 过程中学习到的近似函数来重建原始图像。 我们将把结果与使用 :class:`~sklearn.decomposition.PCA` 进行的精确重建进行比较。 我们将使用USPS数字数据集来重现[1]_第4节中展示的内容。 .. rubric:: 参考文献 .. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf. "Learning to find pre-images." Advances in neural information processing systems 16 (2004): 449-456. `_ .. GENERATED FROM PYTHON SOURCE LINES 20-24 .. code-block:: Python # 作者:Guillaume Lemaitre # 许可证:BSD 3条款 .. GENERATED FROM PYTHON SOURCE LINES 25-29 通过 OpenML 加载数据集 ----------------------- USPS数字数据集在OpenML上可用。我们使用:func:`~sklearn.datasets.fetch_openml` 来获取这个数据集。此外,我们将数据集标准化,使所有像素值都在(0, 1)范围内。 .. GENERATED FROM PYTHON SOURCE LINES 29-38 .. code-block:: Python import numpy as np from sklearn.datasets import fetch_openml from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler X, y = fetch_openml(data_id=41082, as_frame=False, return_X_y=True) X = MinMaxScaler().fit_transform(X) .. GENERATED FROM PYTHON SOURCE LINES 39-44 这个想法是学习一个PCA基(有核和无核)在噪声图像上,然后使用这些模型来重建和去噪这些图像。 因此,我们将数据集分为训练集和测试集,其中训练集包含1000个样本,测试集包含100个样本。这些图像是无噪声的,我们将用它们来评估去噪方法的效率。此外,我们创建了原始数据集的副本并添加了高斯噪声。 这个应用程序的目的是通过在一些未受损的图像上学习PCA基来展示我们可以对受损图像进行去噪。我们将使用PCA和基于核的PCA来解决这个问题。 .. GENERATED FROM PYTHON SOURCE LINES 44-55 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X, y, stratify=y, random_state=0, train_size=1_000, test_size=100 ) rng = np.random.RandomState(0) noise = rng.normal(scale=0.25, size=X_test.shape) X_test_noisy = X_test + noise noise = rng.normal(scale=0.25, size=X_train.shape) X_train_noisy = X_train + noise .. GENERATED FROM PYTHON SOURCE LINES 56-57 此外,我们将创建一个辅助函数,通过绘制测试图像来定性评估图像重建。 .. GENERATED FROM PYTHON SOURCE LINES 57-70 .. code-block:: Python import matplotlib.pyplot as plt def plot_digits(X, title): """小助手函数用于绘制100个数字。""" fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(8, 8)) for img, ax in zip(X, axs.ravel()): ax.imshow(img.reshape((16, 16)), cmap="Greys") ax.axis("off") fig.suptitle(title, fontsize=24) .. GENERATED FROM PYTHON SOURCE LINES 71-74 此外,我们将使用均方误差(MSE)来定量评估图像重建。 首先让我们看看无噪声和有噪声图像之间的区别。我们将检查测试集来了解这一点。 .. GENERATED FROM PYTHON SOURCE LINES 74-79 .. code-block:: Python plot_digits(X_test, "Uncorrupted test images") plot_digits( X_test_noisy, f"Noisy test images\nMSE: {np.mean((X_test - X_test_noisy) ** 2):.2f}" ) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_001.png :alt: Uncorrupted test images :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_001.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_002.png :alt: Noisy test images MSE: 0.06 :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_002.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 80-84 学习 `PCA` 基础 ----------------- 我们现在可以使用线性PCA和使用径向基函数(RBF)核的核PCA来学习我们的PCA基。 .. GENERATED FROM PYTHON SOURCE LINES 84-99 .. code-block:: Python from sklearn.decomposition import PCA, KernelPCA pca = PCA(n_components=32, random_state=42) kernel_pca = KernelPCA( n_components=400, kernel="rbf", gamma=1e-3, fit_inverse_transform=True, alpha=5e-3, random_state=42, ) pca.fit(X_train_noisy) _ = kernel_pca.fit(X_train_noisy) .. GENERATED FROM PYTHON SOURCE LINES 100-104 重建和去噪测试图像 --------------------- 现在,我们可以对噪声测试集进行变换和重构。由于我们使用的成分比原始特征的数量少,因此我们将得到原始集合的近似值。实际上,通过丢弃在PCA中解释方差最少的成分,我们希望去除噪声。在核PCA中也有类似的思路;然而,我们期望得到更好的重构效果,因为我们使用非线性核来学习PCA基,并使用核岭回归来学习映射函数。 .. GENERATED FROM PYTHON SOURCE LINES 104-109 .. code-block:: Python X_reconstructed_kernel_pca = kernel_pca.inverse_transform( kernel_pca.transform(X_test_noisy) ) X_reconstructed_pca = pca.inverse_transform(pca.transform(X_test_noisy)) .. GENERATED FROM PYTHON SOURCE LINES 110-123 .. code-block:: Python plot_digits(X_test, "Uncorrupted test images") plot_digits( X_reconstructed_pca, f"PCA reconstruction\nMSE: {np.mean((X_test - X_reconstructed_pca) ** 2):.2f}", ) plot_digits( X_reconstructed_kernel_pca, ( "Kernel PCA reconstruction\n" f"MSE: {np.mean((X_test - X_reconstructed_kernel_pca) ** 2):.2f}" ), ) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_003.png :alt: Uncorrupted test images :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_003.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_004.png :alt: PCA reconstruction MSE: 0.01 :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_004.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_005.png :alt: Kernel PCA reconstruction MSE: 0.03 :srcset: /auto_examples/applications/images/sphx_glr_plot_digits_denoising_005.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 124-127 PCA的均方误差(MSE)比核PCA低。然而,定性分析可能不会偏向PCA而不是核PCA。我们观察到,核PCA能够去除背景噪声并提供更平滑的图像。 然而,需要注意的是,使用核PCA进行去噪的结果将取决于参数 `n_components` 、 `gamma` 和 `alpha` 。 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.743 seconds) .. _sphx_glr_download_auto_examples_applications_plot_digits_denoising.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/applications/plot_digits_denoising.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_digits_denoising.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_digits_denoising.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_digits_denoising.zip ` .. include:: plot_digits_denoising.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_