高级绘图示例#

如果你想在实时的 Python 内核中尝试这个笔记本,请使用 mybinder:

https://mybinder.org/badge_logo.svg

Vaex 使用 matplotlib 来创建图表,这提供了极大的灵活性。为了避免重复的“样板”代码,Vaex 尝试涵盖许多用例,您可以使用简单的声明式风格来绘制一个或多个面板。

以下示例将使用示例数据集,该数据集是对类似我们银河系的星系形成过程的数值模拟结果(来源)。数据包含模拟中每个起始粒子的3D位置、速度、角动量、能量和铁含量。

让我们从加载数据开始:

[1]:
import vaex
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')
[2]:
df = vaex.example()
df.head()
[2]:
# id x y z vx vy vz E L Lz FeH
0 0 1.23187 -0.396929-0.598058 301.155 174.059 27.4275 -149431 407.389 333.956-1.00539
1 23 -0.163701 3.65422 -0.254906 -195 170.472 142.53 -124248 890.241 684.668-1.70867
2 32 -2.12026 3.32605 1.70784 -48.6342 171.647 -2.07944-138501 372.241-202.176-1.83361
3 8 4.71559 4.58525 2.25154 -232.421 -294.851 62.8586 -60037 1297.63 -324.688-1.47869
4 16 7.21719 11.9947 -1.06456 -1.68917 181.329 -11.3336 -83206.81332.8 1328.95 -1.85705
5 16 -7.78437 5.98977 -0.682695 86.7009 -238.778 -2.31309 -86497.61353.25 1339.42 -1.91944
6 12 8.08373 -3.27348 5.54687 -57.4544 120.117 5.37438-101867 1100.8 782.915-1.93517
7 26 -3.55719 5.41363 0.0917156 -67.0511 -145.933 39.6374 -127682 921.008 882.101-1.79423
8 25 3.9848 5.40691 2.57724 -38.7449 -152.407 -92.9073 -113632 493.316-397.824-1.18076
9 8-20.8139 -3.29468 13.4866 99.4067 28.6749-115.079 -55825.31088.46 -269.324-1.28892

单个图表#

最简单的情况是由两个轴创建的单个热图,由前两个参数指定:

[3]:
df.viz.heatmap('x', 'y', title='Face on galaxy', limits='99%')
../_images/guides_advanced_plotting_5_0.png

相同类型的多个图表#

第一个参数可以是一个轴对的列表。这将生成多个图:

[4]:
df.viz.heatmap([["x", "y"], ["x", "z"]], title="Face on and edge on", figsize=(10, 4), limits='99%');
../_images/guides_advanced_plotting_7_0.png

多个图表,相同坐标轴,不同统计#

如果 what 参数是一个列表,默认情况下它将创建多个子图:

[5]:
df.viz.heatmap("x", "y", what=["count(*)", "mean(vx)", "correlation(vy,vz)"],
               title="Different statistics",
               figsize=(10, 5), limits='99%');
../_images/guides_advanced_plotting_9_0.png

多个图表,不同的轴,不同的统计#

可以指定多个轴对作为第一个参数,以及一个what参数列表。生成的图形将包含多个子图,其中不同的轴组合将形成行,而不同的what统计量将形成列:

[6]:
df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
               what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
               title="Different statistics and plots",
               figsize=(14,12),
               limits='99%');
../_images/guides_advanced_plotting_11_0.png

还可以通过visual参数指定图形的布局,该参数可用于交换子图的行和列顺序:

[7]:
df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
               what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
               visual=dict(row="what", column="subspace"),
               title="Different statistics and plots",
               figsize=(14,12),
               limits='99%');
../_images/guides_advanced_plotting_13_0.png

第三维度的切片#

如果提供了第三个轴(z),你可以“切片”数据,将z切片显示为行。请注意,这里的行是换行的,可以通过wrap_columns参数进行更改:

[8]:
df.viz.heatmap("Lz", "E", z="FeH:-3,-1,8",
               visual=dict(row="z"),
               figsize=(12, 8),
               f="log",
               wrap_columns=3,
               limits='99%');
../_images/guides_advanced_plotting_15_0.png

多图环绕#

如果尝试创建一个包含许多子图的图形,它们将会很好地排列。在示例数据集中,我们创建了所有列组合的热图,按它们的互信息排序:

[9]:
# Get all column pars
pairs = df.combinations(exclude=['id'])
# Calculate the mutual information for each pair, sorted by the largest value
mi, pairs_sorted = df.mutual_information(pairs, sort=True)

# Create the figure
df.viz.heatmap(pairs_sorted, f='log', colorbar=False, figsize=(14, 20), limits='99%', wrap_columns=5);
../_images/guides_advanced_plotting_17_0.png

绘图选择#

如果使用了selection参数,则只绘制选定的部分:

[10]:
df.viz.heatmap("x", "y", selection="sqrt(x**2+y**2) < 5", limits=[-10, 10]);
../_images/guides_advanced_plotting_19_0.png

如果指定了选择列表(FalseNone 表示没有选择),那么默认情况下,每个选择都会形成所生成图形的不同“层”:

[11]:
df.viz.heatmap("x", "y",
               selection=[None, "sqrt(x**2+y**2) < 5", "(sqrt(x**2+y**2) < 7) & (x < 0)"],
               limits=[-10, 10]);
../_images/guides_advanced_plotting_21_0.png

在热图上叠加矢量场#

天文学家认为,像我们银河系这样的星系是由许多前星系团块合并和混合而成的。尝试找到原始前星系碎片的一种方法是检查它们的能量(𝐸)和角动量(𝐿𝑧)的二维分布。因此,让我们制作这样的图表:

[12]:
df.viz.heatmap('Lz', 'E', f='log', figsize=(9, 6));
../_images/guides_advanced_plotting_23_0.png

现在,为了展示上图中每个星团中的恒星确实在空间中一致移动,我们可以在位置热图上叠加它们的速度矢量。

首先,让我们选择属于其中一个星团的星星:

[13]:
# specify ranges of angular momentum (Lz) and energy (E)
limits_Lz_E_clump = (1181.770, 1291.92), (-70850.91, -68491.16)

# Use the rectangle selection method
df.select_rectangle("Lz", "E", limits_Lz_E_clump, name="stream")

# Check how many stars we have selected
print(f'Selection contains {df.count(selection="stream")} "stars".')
Selection contains 9556 "stars".

我们还可以叠加显示所选区域,以确信我们选择了一个好的区域:

[14]:
df.viz.heatmap("Lz", "E", selection=[None, "stream"], f="log", figsize=(9, 6));
../_images/guides_advanced_plotting_27_0.png

现在让我们在𝑦−𝑧图上绘制𝑣𝑦和𝑣𝑧速度矢量。首先,我们计算一个平均𝑣𝑦和𝑣𝑧速度的网格。请注意,我们将𝑣𝑦和𝑣𝑧值的范围限制在-20到20之间,网格分辨率为32x32个区间:

[15]:
limits = [-20, 20]
shape_vector = 32
mean_vy = df.mean("vy", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mean_vz = df.mean("vz", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')

接下来,让我们创建一个网格来保存箱子的中心:

[16]:
# create a 2d array with holds the center of the bins
centers = np.linspace(*limits, shape_vector, endpoint=False) + (limits[1] - limits[0])/2./shape_vector
z, y = np.meshgrid(centers, centers)

为了保持图表的“整洁”,我们也不希望可视化计数较少的箱子的速度:

[17]:
# we don't want to show bins with low number of counts
counts = df.count(binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mask = counts.flatten( ) > 10

最后,我们可以绘制一个\(v_y\)\(v_z\)的背景密度图,然后使用plt.quiver来叠加速度矢量:

[18]:
df.viz.heatmap("y", "z", limits=limits, f="log1p", figsize=(10, 9), selection=[None, "stream"], shape=128)

# overplot the mean velocity vectors
plt.quiver(y.flatten()[mask],
           z.flatten()[mask],
           mean_vy.flatten()[mask],
           mean_vz.flatten()[mask],
           color="white",
           alpha=0.75);
../_images/guides_advanced_plotting_35_0.png

我们确实看到我们选择的星星一起移动,并形成了一条流!

绘制healpix地图#

Healpix 通过 healpy 包提供。Vaex 不需要对 healpix 进行特殊支持,但引入了一些辅助函数以使使用 healpix 更加方便。

确保你已经安装了healpy。如果没有,你可以使用以下命令之一来安装它:

!pip install healpy  # if you prefer pip
!conda install -c conda-forge healpy if you are using a conda package manager

为了更好地理解这一点,我们将从头开始。如果我们想制作一个密度天空图,我们希望向healpy传递一个一维numpy数组,其中每个值代表球体上某个位置的密度,该位置由数组大小(healpix级别)和偏移量(位置)决定。

此示例使用了模拟的Gaia数据集。Gaia数据包括在source_id列中编码的healpix索引。通过将source_id除以34359738368,您可以得到healpix索引级别12,进一步除以该值将带您到更低的级别。

让我们从获取数据集开始(注意:数据集在磁盘上约为700MB)。

[19]:
import healpy as hp
[20]:
df = vaex.datasets.tgas(full=True)
df.head()
[20]:
# astrometric_delta_q astrometric_excess_noise astrometric_excess_noise_sig astrometric_n_bad_obs_ac astrometric_n_bad_obs_al astrometric_n_good_obs_ac astrometric_n_good_obs_al astrometric_n_obs_ac astrometric_n_obs_al astrometric_primary_flag astrometric_priors_used astrometric_relegation_factor astrometric_weight_ac astrometric_weight_al b dec dec_error dec_parallax_corr dec_pmdec_corr dec_pmra_corr duplicated_source ecl_lat ecl_lon hip l matched_observations parallax parallax_error parallax_pmdec_corr parallax_pmra_corr phot_g_mean_flux phot_g_mean_flux_error phot_g_mean_mag phot_g_n_obsphot_variable_flag pmdec pmdec_error pmra pmra_error pmra_pmdec_corr ra ra_dec_corr ra_error ra_parallax_corr ra_pmdec_corr ra_pmra_corr random_index ref_epoch scan_direction_mean_k1 scan_direction_mean_k2 scan_direction_mean_k3 scan_direction_mean_k4 scan_direction_strength_k1 scan_direction_strength_k2 scan_direction_strength_k3 scan_direction_strength_k4 solution_id source_idtycho2_id
0 1.91906 0.717101 412.606 1 0 78 79 79 79 84 3 2.9361 1.26696e-05 1.81816-48.71440.235392 0.218802 -0.407338 0.0606588 -0.0994513 70 -16.1211 42.6418 13989176.74 9 6.35295 0.30791 -0.101957 -0.00157679 1.03123e+07 10577.4 7.99138 77b'NOT_AVAILABLE' -7.64199 0.0874018 43.7523 0.0705422 0.21467745.0343 -0.414972 0.305989 0.179966 -0.0857597 0.159207 243619 2015 -113.76 21.3929 -41.6784 26.2018 0.382348 0.538266 0.392379 0.9163061635378410781933568 7627862074752b''
1 nan 0.253463 47.3163 2 0 55 57 57 57 84 5 2.65231 3.16002e-05 12.8616 -48.645 0.200068 1.19779 0.837626 -0.975644 0.972577 70 -16.193 42.7612-2147483648176.916 8 3.90033 0.323488 -0.853779 0.839739 949565 1140.17 10.581 62b'NOT_AVAILABLE' -55.1092 2.52293 10.0363 4.61141 -0.99639945.165 -0.995923 2.58388 -0.860911 0.97348 -0.972417 487238 2015 -156.433 22.7661 -36.2397 22.8906 0.711003 0.96597 0.646115 0.86716 1635378410781933568 9277129363072b'55-28-1'
2 nan 0.398901 221.185 4 1 57 60 61 61 84 5 3.9934 2.56339e-05 5.76753-48.66780.248825 0.180326 -0.391891 -0.193256 0.0894205 70 -16.1234 42.6975-2147483648176.78 7 3.15531 0.273484 -0.118552 -0.0418587 817838 1827.38 10.7431 60b'NOT_AVAILABLE' -1.60287 1.03526 2.93228 1.90864 -0.91427145.0862 -0.177443 0.213836 0.307722 -0.184817 0.0468668 1948952 2015 -117.008 19.7722 -43.1082 26.7157 0.482528 0.428758 0.524153 0.903062163537841078193356813297218905216b'55-1191-1'
3 nan 0.422492 179.982 1 0 51 52 52 52 84 5 4.21516 2.86726e-05 5.36086-48.68240.248211 0.200958 -0.337217 -0.223501 0.131811 70 -16.1182 42.6778-2147483648176.76 7 2.29237 0.280972 -0.109202 -0.0494409 602053 905.877 11.0757 61b'NOT_AVAILABLE' -18.4149 1.12985 3.66198 2.06505 -0.92617745.0665 -0.365707 0.276039 0.202878 -0.0589288 -0.0509089 102321 2015 -132.421 22.5693 -38.9545 25.8786 0.494655 0.638456 0.509074 0.898918163537841078193356813469017597184b'55-624-1'
4 nan 0.3175 119.748 2 3 85 84 87 87 84 5 3.23564 2.22788e-05 8.08078-48.572 0.335044 0.17013 -0.438708 -0.279349 0.121792 70 -16.0555 42.7734-2147483648176.739 11 1.58208 0.261539 -0.329196 0.100312 1.38812e+06 2826.43 10.1687 96b'NOT_AVAILABLE' -2.37939 0.710632 0.340802 1.22048 -0.83360445.136 -0.0490526 0.170697 0.471425 -0.156392 -0.152076 409284 2015 -106.86 4.4521 -47.8954 26.7555 0.520654 0.23931 0.653377 0.863385163537841078193356815736760328576b'55-849-1'
5 nan 0.303723 64.6868 2 1 68 69 70 70 84 5 3.10892 2.22511e-05 9.65279-48.55110.359618 0.179848 -0.437142 -0.376402 0.257906 70 -16.0335 42.7861-2147483648176.718 9 8.66308 0.255867 -0.297309 0.0791063 1.66384e+06 1381.58 9.97199 76b'NOT_AVAILABLE' -72.7114 0.720852 -52.8493 1.26429 -0.85278445.1414 -0.264588 0.205008 0.39493 0.102073 -0.36853 204642 2015 -127.824 16.3828 -44.2417 25.1631 0.522809 0.479366 0.621515 0.847412163537841078193356816527034310784b'55-182-1'
6 nan 0.340405 118.911 2 1 76 77 78 78 84 5 3.44745 2.19728e-05 7.91894-48.52420.386343 0.17188 -0.341053 -0.34408 0.1516 70 -16.0114 42.8058-2147483648176.701 9 5.6982 0.263677 -0.367848 0.0846782 1.821e+06 2755.91 9.874 77b'NOT_AVAILABLE' -3.35036 0.707184 24.5272 1.17738 -0.80009845.153 -0.0412512 0.189524 0.488929 -0.163855 -0.195289 540954 2015 -114.478 11.0431 -46.4507 26.2651 0.512088 0.322961 0.637399 0.856398163537841078193356816733192740608b'55-867-1'
7 nan 0.253709 88.6261 3 0 76 79 79 79 84 5 2.65453 2.57372e-05 13.709 -48.55690.380844 0.150943 -0.139315 -0.358996 0.238914 70 -16.0049 42.7641-2147483648176.665 10 2.09081 0.222206 -0.277202 0.093748 967144 601.802 10.561 87b'NOT_AVAILABLE' -11.6616 0.982994 -1.57293 1.73319 -0.90422345.1128 -0.187136 0.206981 0.412381 0.0994892 -0.284353 1081909 2015 -88.3027 14.7861 -47.9744 27.0228 0.39079 0.333692 0.400387 0.90071 163537841078193356816870631694208b'55-72-1'
8 nan 0.401473 226.044 3 1 69 71 72 72 84 5 4.01755 2.45771e-05 5.41389-48.65110.351099 0.169345 -0.276625 -0.175754 0.101633 70 -16.0034 42.6531-2147483648176.589 9 6.20249 0.247253 -0.139338 0.0669677 1.66582e+06 1233.43 9.9707 79b'NOT_AVAILABLE' 9.19541 1.02832 26.308 2.03485 -0.90549645.0103 -0.321544 0.243576 0.263603 -0.143727 0.107397 589318 2015 -106.23 19.3449 -44.7095 25.5226 0.335982 0.520842 0.35827 0.90504 163537841078193356826834955821312b'55-912-1'
9 nan 0.235866 49.3216 2 0 51 53 53 53 84 5 2.49518 2.42543e-05 15.7304 -48.59120.473472 0.163531 -0.0605532 -0.242013 0.14566 70 -15.8759 42.6549-2147483648176.419 8 1.67767 0.222067 -0.18584 0.0668122 1.96682e+06 1184.17 9.79036 62b'NOT_AVAILABLE' -24.5264 1.1319 9.10421 2.20939 -0.92529 44.9747 -0.407078 0.267911 0.236157 -0.0912424 0.0305957 1178636 2015 -99.9696 19.5819 -46.0718 24.0416 0.217998 0.655547 0.219464 0.892649163537841078193356833260226885120b'48-1139-1'

让我们绘制一个级别为2的healpix图。我们可以从计算每个healpix区域中的星星数量开始:

[21]:
level = 2
factor = 34359738368 * (4**(12-level))
nmax = hp.nside2npix(2**level)
counts = df.count(binby="source_id/" + str(factor), limits=[0, nmax], shape=nmax)
counts
[21]:
array([ 4021,  6171,  5318,  7114,  5755, 13420, 12711, 10193,  7782,
       14187, 12578, 22038, 17313, 13064, 17298, 11887,  3859,  3488,
        9036,  5533,  4007,  3899,  4884,  5664, 10741,  7678, 12092,
       10182,  6652,  6793, 10117,  9614,  3727,  5849,  4028,  5505,
        8462, 10059,  6581,  8282,  4757,  5116,  4578,  5452,  6023,
        8340,  6440,  8623,  7308,  6197, 21271, 23176, 12975, 17138,
       26783, 30575, 31931, 29697, 17986, 16987, 19802, 15632, 14273,
       10594,  4807,  4551,  4028,  4357,  4067,  4206,  3505,  4137,
        3311,  3582,  3586,  4218,  4529,  4360,  6767,  7579, 14462,
       24291, 10638, 11250, 29619,  9678, 23322, 18205,  7625,  9891,
        5423,  5808, 14438, 17251,  7833, 15226,  7123,  3708,  6135,
        4110,  3587,  3222,  3074,  3941,  3846,  3402,  3564,  3425,
        4125,  4026,  3689,  4084, 16617, 13577,  6911,  4837, 13553,
       10074,  9534, 20824,  4976,  6707,  5396,  8366, 13494, 19766,
       11012, 16130,  8521,  8245,  6871,  5977,  8789, 10016,  6517,
        8019,  6122,  5465,  5414,  4934,  5788,  6139,  4310,  4144,
       11437, 30731, 13741, 27285, 40227, 16320, 23039, 10812, 14686,
       27690, 15155, 32701, 18780,  5895, 23348,  6081, 17050, 28498,
       35232, 26223, 22341, 15867, 17688,  8580, 24895, 13027, 11223,
        7880,  8386,  6988,  5815,  4717,  9088,  8283, 12059,  9161,
        6952,  4914,  6652,  4666, 12014, 10703, 16518, 10270,  6724,
        4553,  9282,  4981])

使用healpy包,我们可以在摩尔维德投影中绘制这个

[22]:
hp.mollview(counts, nest=True);
../_images/guides_advanced_plotting_44_0.png

为了避免重复编写上述代码,我们可以使用df.healpix_count方法代替:

[23]:
counts = df.healpix_count(healpix_level=6)
hp.mollview(counts, nest=True)
../_images/guides_advanced_plotting_46_0.png

我们可以使用vaex的df.viz.healpix_plot方法,而不是使用healpy:

[24]:
df.viz.healpix_heatmap(f="log1p", healpix_level=6, figsize=(10,8), healpix_output="ecliptic")
../_images/guides_advanced_plotting_48_0.png