高级绘图示例#
如果你想在实时的 Python 内核中尝试这个笔记本,请使用 mybinder:
Vaex 使用 matplotlib 来创建图表,这提供了极大的灵活性。为了避免重复的“样板”代码,Vaex 尝试涵盖许多用例,您可以使用简单的声明式风格来绘制一个或多个面板。
以下示例将使用示例数据集,该数据集是对类似我们银河系的星系形成过程的数值模拟结果(来源)。数据包含模拟中每个起始粒子的3D位置、速度、角动量、能量和铁含量。
让我们从加载数据开始:
[1]:
import vaex
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
[2]:
df = vaex.example()
df.head()
[2]:
| # | id | x | y | z | vx | vy | vz | E | L | Lz | FeH |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1.23187 | -0.396929 | -0.598058 | 301.155 | 174.059 | 27.4275 | -149431 | 407.389 | 333.956 | -1.00539 |
| 1 | 23 | -0.163701 | 3.65422 | -0.254906 | -195 | 170.472 | 142.53 | -124248 | 890.241 | 684.668 | -1.70867 |
| 2 | 32 | -2.12026 | 3.32605 | 1.70784 | -48.6342 | 171.647 | -2.07944 | -138501 | 372.241 | -202.176 | -1.83361 |
| 3 | 8 | 4.71559 | 4.58525 | 2.25154 | -232.421 | -294.851 | 62.8586 | -60037 | 1297.63 | -324.688 | -1.47869 |
| 4 | 16 | 7.21719 | 11.9947 | -1.06456 | -1.68917 | 181.329 | -11.3336 | -83206.8 | 1332.8 | 1328.95 | -1.85705 |
| 5 | 16 | -7.78437 | 5.98977 | -0.682695 | 86.7009 | -238.778 | -2.31309 | -86497.6 | 1353.25 | 1339.42 | -1.91944 |
| 6 | 12 | 8.08373 | -3.27348 | 5.54687 | -57.4544 | 120.117 | 5.37438 | -101867 | 1100.8 | 782.915 | -1.93517 |
| 7 | 26 | -3.55719 | 5.41363 | 0.0917156 | -67.0511 | -145.933 | 39.6374 | -127682 | 921.008 | 882.101 | -1.79423 |
| 8 | 25 | 3.9848 | 5.40691 | 2.57724 | -38.7449 | -152.407 | -92.9073 | -113632 | 493.316 | -397.824 | -1.18076 |
| 9 | 8 | -20.8139 | -3.29468 | 13.4866 | 99.4067 | 28.6749 | -115.079 | -55825.3 | 1088.46 | -269.324 | -1.28892 |
单个图表#
最简单的情况是由两个轴创建的单个热图,由前两个参数指定:
[3]:
df.viz.heatmap('x', 'y', title='Face on galaxy', limits='99%')
相同类型的多个图表#
第一个参数可以是一个轴对的列表。这将生成多个图:
[4]:
df.viz.heatmap([["x", "y"], ["x", "z"]], title="Face on and edge on", figsize=(10, 4), limits='99%');
多个图表,相同坐标轴,不同统计#
如果 what 参数是一个列表,默认情况下它将创建多个子图:
[5]:
df.viz.heatmap("x", "y", what=["count(*)", "mean(vx)", "correlation(vy,vz)"],
title="Different statistics",
figsize=(10, 5), limits='99%');
多个图表,不同的轴,不同的统计#
可以指定多个轴对作为第一个参数,以及一个what参数列表。生成的图形将包含多个子图,其中不同的轴组合将形成行,而不同的what统计量将形成列:
[6]:
df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
title="Different statistics and plots",
figsize=(14,12),
limits='99%');
还可以通过visual参数指定图形的布局,该参数可用于交换子图的行和列顺序:
[7]:
df.viz.heatmap([["x", "y"], ["x", "z"], ["y", "z"]],
what=["count(*)", "mean(vx)", "correlation(vx,vy)", "correlation(vx,vz)"],
visual=dict(row="what", column="subspace"),
title="Different statistics and plots",
figsize=(14,12),
limits='99%');
第三维度的切片#
如果提供了第三个轴(z),你可以“切片”数据,将z切片显示为行。请注意,这里的行是换行的,可以通过wrap_columns参数进行更改:
[8]:
df.viz.heatmap("Lz", "E", z="FeH:-3,-1,8",
visual=dict(row="z"),
figsize=(12, 8),
f="log",
wrap_columns=3,
limits='99%');
多图环绕#
如果尝试创建一个包含许多子图的图形,它们将会很好地排列。在示例数据集中,我们创建了所有列组合的热图,按它们的互信息排序:
[9]:
# Get all column pars
pairs = df.combinations(exclude=['id'])
# Calculate the mutual information for each pair, sorted by the largest value
mi, pairs_sorted = df.mutual_information(pairs, sort=True)
# Create the figure
df.viz.heatmap(pairs_sorted, f='log', colorbar=False, figsize=(14, 20), limits='99%', wrap_columns=5);
绘图选择#
如果使用了selection参数,则只绘制选定的部分:
[10]:
df.viz.heatmap("x", "y", selection="sqrt(x**2+y**2) < 5", limits=[-10, 10]);
如果指定了选择列表(False 或 None 表示没有选择),那么默认情况下,每个选择都会形成所生成图形的不同“层”:
[11]:
df.viz.heatmap("x", "y",
selection=[None, "sqrt(x**2+y**2) < 5", "(sqrt(x**2+y**2) < 7) & (x < 0)"],
limits=[-10, 10]);
在热图上叠加矢量场#
天文学家认为,像我们银河系这样的星系是由许多前星系团块合并和混合而成的。尝试找到原始前星系碎片的一种方法是检查它们的能量(𝐸)和角动量(𝐿𝑧)的二维分布。因此,让我们制作这样的图表:
[12]:
df.viz.heatmap('Lz', 'E', f='log', figsize=(9, 6));
现在,为了展示上图中每个星团中的恒星确实在空间中一致移动,我们可以在位置热图上叠加它们的速度矢量。
首先,让我们选择属于其中一个星团的星星:
[13]:
# specify ranges of angular momentum (Lz) and energy (E)
limits_Lz_E_clump = (1181.770, 1291.92), (-70850.91, -68491.16)
# Use the rectangle selection method
df.select_rectangle("Lz", "E", limits_Lz_E_clump, name="stream")
# Check how many stars we have selected
print(f'Selection contains {df.count(selection="stream")} "stars".')
Selection contains 9556 "stars".
我们还可以叠加显示所选区域,以确信我们选择了一个好的区域:
[14]:
df.viz.heatmap("Lz", "E", selection=[None, "stream"], f="log", figsize=(9, 6));
现在让我们在𝑦−𝑧图上绘制𝑣𝑦和𝑣𝑧速度矢量。首先,我们计算一个平均𝑣𝑦和𝑣𝑧速度的网格。请注意,我们将𝑣𝑦和𝑣𝑧值的范围限制在-20到20之间,网格分辨率为32x32个区间:
[15]:
limits = [-20, 20]
shape_vector = 32
mean_vy = df.mean("vy", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mean_vz = df.mean("vz", binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
接下来,让我们创建一个网格来保存箱子的中心:
[16]:
# create a 2d array with holds the center of the bins
centers = np.linspace(*limits, shape_vector, endpoint=False) + (limits[1] - limits[0])/2./shape_vector
z, y = np.meshgrid(centers, centers)
为了保持图表的“整洁”,我们也不希望可视化计数较少的箱子的速度:
[17]:
# we don't want to show bins with low number of counts
counts = df.count(binby=["y", "z"], limits=limits, shape=shape_vector, selection='stream')
mask = counts.flatten( ) > 10
最后,我们可以绘制一个\(v_y\)与\(v_z\)的背景密度图,然后使用plt.quiver来叠加速度矢量:
[18]:
df.viz.heatmap("y", "z", limits=limits, f="log1p", figsize=(10, 9), selection=[None, "stream"], shape=128)
# overplot the mean velocity vectors
plt.quiver(y.flatten()[mask],
z.flatten()[mask],
mean_vy.flatten()[mask],
mean_vz.flatten()[mask],
color="white",
alpha=0.75);
我们确实看到我们选择的星星一起移动,并形成了一条流!
绘制healpix地图#
Healpix 通过 healpy 包提供。Vaex 不需要对 healpix 进行特殊支持,但引入了一些辅助函数以使使用 healpix 更加方便。
确保你已经安装了healpy。如果没有,你可以使用以下命令之一来安装它:
!pip install healpy # if you prefer pip
!conda install -c conda-forge healpy if you are using a conda package manager
为了更好地理解这一点,我们将从头开始。如果我们想制作一个密度天空图,我们希望向healpy传递一个一维numpy数组,其中每个值代表球体上某个位置的密度,该位置由数组大小(healpix级别)和偏移量(位置)决定。
此示例使用了模拟的Gaia数据集。Gaia数据包括在source_id列中编码的healpix索引。通过将source_id除以34359738368,您可以得到healpix索引级别12,进一步除以该值将带您到更低的级别。
让我们从获取数据集开始(注意:数据集在磁盘上约为700MB)。
[19]:
import healpy as hp
[20]:
df = vaex.datasets.tgas(full=True)
df.head()
[20]:
| # | astrometric_delta_q | astrometric_excess_noise | astrometric_excess_noise_sig | astrometric_n_bad_obs_ac | astrometric_n_bad_obs_al | astrometric_n_good_obs_ac | astrometric_n_good_obs_al | astrometric_n_obs_ac | astrometric_n_obs_al | astrometric_primary_flag | astrometric_priors_used | astrometric_relegation_factor | astrometric_weight_ac | astrometric_weight_al | b | dec | dec_error | dec_parallax_corr | dec_pmdec_corr | dec_pmra_corr | duplicated_source | ecl_lat | ecl_lon | hip | l | matched_observations | parallax | parallax_error | parallax_pmdec_corr | parallax_pmra_corr | phot_g_mean_flux | phot_g_mean_flux_error | phot_g_mean_mag | phot_g_n_obs | phot_variable_flag | pmdec | pmdec_error | pmra | pmra_error | pmra_pmdec_corr | ra | ra_dec_corr | ra_error | ra_parallax_corr | ra_pmdec_corr | ra_pmra_corr | random_index | ref_epoch | scan_direction_mean_k1 | scan_direction_mean_k2 | scan_direction_mean_k3 | scan_direction_mean_k4 | scan_direction_strength_k1 | scan_direction_strength_k2 | scan_direction_strength_k3 | scan_direction_strength_k4 | solution_id | source_id | tycho2_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.91906 | 0.717101 | 412.606 | 1 | 0 | 78 | 79 | 79 | 79 | 84 | 3 | 2.9361 | 1.26696e-05 | 1.81816 | -48.7144 | 0.235392 | 0.218802 | -0.407338 | 0.0606588 | -0.0994513 | 70 | -16.1211 | 42.6418 | 13989 | 176.74 | 9 | 6.35295 | 0.30791 | -0.101957 | -0.00157679 | 1.03123e+07 | 10577.4 | 7.99138 | 77 | b'NOT_AVAILABLE' | -7.64199 | 0.0874018 | 43.7523 | 0.0705422 | 0.214677 | 45.0343 | -0.414972 | 0.305989 | 0.179966 | -0.0857597 | 0.159207 | 243619 | 2015 | -113.76 | 21.3929 | -41.6784 | 26.2018 | 0.382348 | 0.538266 | 0.392379 | 0.916306 | 1635378410781933568 | 7627862074752 | b'' |
| 1 | nan | 0.253463 | 47.3163 | 2 | 0 | 55 | 57 | 57 | 57 | 84 | 5 | 2.65231 | 3.16002e-05 | 12.8616 | -48.645 | 0.200068 | 1.19779 | 0.837626 | -0.975644 | 0.972577 | 70 | -16.193 | 42.7612 | -2147483648 | 176.916 | 8 | 3.90033 | 0.323488 | -0.853779 | 0.839739 | 949565 | 1140.17 | 10.581 | 62 | b'NOT_AVAILABLE' | -55.1092 | 2.52293 | 10.0363 | 4.61141 | -0.996399 | 45.165 | -0.995923 | 2.58388 | -0.860911 | 0.97348 | -0.972417 | 487238 | 2015 | -156.433 | 22.7661 | -36.2397 | 22.8906 | 0.711003 | 0.96597 | 0.646115 | 0.86716 | 1635378410781933568 | 9277129363072 | b'55-28-1' |
| 2 | nan | 0.398901 | 221.185 | 4 | 1 | 57 | 60 | 61 | 61 | 84 | 5 | 3.9934 | 2.56339e-05 | 5.76753 | -48.6678 | 0.248825 | 0.180326 | -0.391891 | -0.193256 | 0.0894205 | 70 | -16.1234 | 42.6975 | -2147483648 | 176.78 | 7 | 3.15531 | 0.273484 | -0.118552 | -0.0418587 | 817838 | 1827.38 | 10.7431 | 60 | b'NOT_AVAILABLE' | -1.60287 | 1.03526 | 2.93228 | 1.90864 | -0.914271 | 45.0862 | -0.177443 | 0.213836 | 0.307722 | -0.184817 | 0.0468668 | 1948952 | 2015 | -117.008 | 19.7722 | -43.1082 | 26.7157 | 0.482528 | 0.428758 | 0.524153 | 0.903062 | 1635378410781933568 | 13297218905216 | b'55-1191-1' |
| 3 | nan | 0.422492 | 179.982 | 1 | 0 | 51 | 52 | 52 | 52 | 84 | 5 | 4.21516 | 2.86726e-05 | 5.36086 | -48.6824 | 0.248211 | 0.200958 | -0.337217 | -0.223501 | 0.131811 | 70 | -16.1182 | 42.6778 | -2147483648 | 176.76 | 7 | 2.29237 | 0.280972 | -0.109202 | -0.0494409 | 602053 | 905.877 | 11.0757 | 61 | b'NOT_AVAILABLE' | -18.4149 | 1.12985 | 3.66198 | 2.06505 | -0.926177 | 45.0665 | -0.365707 | 0.276039 | 0.202878 | -0.0589288 | -0.0509089 | 102321 | 2015 | -132.421 | 22.5693 | -38.9545 | 25.8786 | 0.494655 | 0.638456 | 0.509074 | 0.898918 | 1635378410781933568 | 13469017597184 | b'55-624-1' |
| 4 | nan | 0.3175 | 119.748 | 2 | 3 | 85 | 84 | 87 | 87 | 84 | 5 | 3.23564 | 2.22788e-05 | 8.08078 | -48.572 | 0.335044 | 0.17013 | -0.438708 | -0.279349 | 0.121792 | 70 | -16.0555 | 42.7734 | -2147483648 | 176.739 | 11 | 1.58208 | 0.261539 | -0.329196 | 0.100312 | 1.38812e+06 | 2826.43 | 10.1687 | 96 | b'NOT_AVAILABLE' | -2.37939 | 0.710632 | 0.340802 | 1.22048 | -0.833604 | 45.136 | -0.0490526 | 0.170697 | 0.471425 | -0.156392 | -0.152076 | 409284 | 2015 | -106.86 | 4.4521 | -47.8954 | 26.7555 | 0.520654 | 0.23931 | 0.653377 | 0.863385 | 1635378410781933568 | 15736760328576 | b'55-849-1' |
| 5 | nan | 0.303723 | 64.6868 | 2 | 1 | 68 | 69 | 70 | 70 | 84 | 5 | 3.10892 | 2.22511e-05 | 9.65279 | -48.5511 | 0.359618 | 0.179848 | -0.437142 | -0.376402 | 0.257906 | 70 | -16.0335 | 42.7861 | -2147483648 | 176.718 | 9 | 8.66308 | 0.255867 | -0.297309 | 0.0791063 | 1.66384e+06 | 1381.58 | 9.97199 | 76 | b'NOT_AVAILABLE' | -72.7114 | 0.720852 | -52.8493 | 1.26429 | -0.852784 | 45.1414 | -0.264588 | 0.205008 | 0.39493 | 0.102073 | -0.36853 | 204642 | 2015 | -127.824 | 16.3828 | -44.2417 | 25.1631 | 0.522809 | 0.479366 | 0.621515 | 0.847412 | 1635378410781933568 | 16527034310784 | b'55-182-1' |
| 6 | nan | 0.340405 | 118.911 | 2 | 1 | 76 | 77 | 78 | 78 | 84 | 5 | 3.44745 | 2.19728e-05 | 7.91894 | -48.5242 | 0.386343 | 0.17188 | -0.341053 | -0.34408 | 0.1516 | 70 | -16.0114 | 42.8058 | -2147483648 | 176.701 | 9 | 5.6982 | 0.263677 | -0.367848 | 0.0846782 | 1.821e+06 | 2755.91 | 9.874 | 77 | b'NOT_AVAILABLE' | -3.35036 | 0.707184 | 24.5272 | 1.17738 | -0.800098 | 45.153 | -0.0412512 | 0.189524 | 0.488929 | -0.163855 | -0.195289 | 540954 | 2015 | -114.478 | 11.0431 | -46.4507 | 26.2651 | 0.512088 | 0.322961 | 0.637399 | 0.856398 | 1635378410781933568 | 16733192740608 | b'55-867-1' |
| 7 | nan | 0.253709 | 88.6261 | 3 | 0 | 76 | 79 | 79 | 79 | 84 | 5 | 2.65453 | 2.57372e-05 | 13.709 | -48.5569 | 0.380844 | 0.150943 | -0.139315 | -0.358996 | 0.238914 | 70 | -16.0049 | 42.7641 | -2147483648 | 176.665 | 10 | 2.09081 | 0.222206 | -0.277202 | 0.093748 | 967144 | 601.802 | 10.561 | 87 | b'NOT_AVAILABLE' | -11.6616 | 0.982994 | -1.57293 | 1.73319 | -0.904223 | 45.1128 | -0.187136 | 0.206981 | 0.412381 | 0.0994892 | -0.284353 | 1081909 | 2015 | -88.3027 | 14.7861 | -47.9744 | 27.0228 | 0.39079 | 0.333692 | 0.400387 | 0.90071 | 1635378410781933568 | 16870631694208 | b'55-72-1' |
| 8 | nan | 0.401473 | 226.044 | 3 | 1 | 69 | 71 | 72 | 72 | 84 | 5 | 4.01755 | 2.45771e-05 | 5.41389 | -48.6511 | 0.351099 | 0.169345 | -0.276625 | -0.175754 | 0.101633 | 70 | -16.0034 | 42.6531 | -2147483648 | 176.589 | 9 | 6.20249 | 0.247253 | -0.139338 | 0.0669677 | 1.66582e+06 | 1233.43 | 9.9707 | 79 | b'NOT_AVAILABLE' | 9.19541 | 1.02832 | 26.308 | 2.03485 | -0.905496 | 45.0103 | -0.321544 | 0.243576 | 0.263603 | -0.143727 | 0.107397 | 589318 | 2015 | -106.23 | 19.3449 | -44.7095 | 25.5226 | 0.335982 | 0.520842 | 0.35827 | 0.90504 | 1635378410781933568 | 26834955821312 | b'55-912-1' |
| 9 | nan | 0.235866 | 49.3216 | 2 | 0 | 51 | 53 | 53 | 53 | 84 | 5 | 2.49518 | 2.42543e-05 | 15.7304 | -48.5912 | 0.473472 | 0.163531 | -0.0605532 | -0.242013 | 0.14566 | 70 | -15.8759 | 42.6549 | -2147483648 | 176.419 | 8 | 1.67767 | 0.222067 | -0.18584 | 0.0668122 | 1.96682e+06 | 1184.17 | 9.79036 | 62 | b'NOT_AVAILABLE' | -24.5264 | 1.1319 | 9.10421 | 2.20939 | -0.92529 | 44.9747 | -0.407078 | 0.267911 | 0.236157 | -0.0912424 | 0.0305957 | 1178636 | 2015 | -99.9696 | 19.5819 | -46.0718 | 24.0416 | 0.217998 | 0.655547 | 0.219464 | 0.892649 | 1635378410781933568 | 33260226885120 | b'48-1139-1' |
让我们绘制一个级别为2的healpix图。我们可以从计算每个healpix区域中的星星数量开始:
[21]:
level = 2
factor = 34359738368 * (4**(12-level))
nmax = hp.nside2npix(2**level)
counts = df.count(binby="source_id/" + str(factor), limits=[0, nmax], shape=nmax)
counts
[21]:
array([ 4021, 6171, 5318, 7114, 5755, 13420, 12711, 10193, 7782,
14187, 12578, 22038, 17313, 13064, 17298, 11887, 3859, 3488,
9036, 5533, 4007, 3899, 4884, 5664, 10741, 7678, 12092,
10182, 6652, 6793, 10117, 9614, 3727, 5849, 4028, 5505,
8462, 10059, 6581, 8282, 4757, 5116, 4578, 5452, 6023,
8340, 6440, 8623, 7308, 6197, 21271, 23176, 12975, 17138,
26783, 30575, 31931, 29697, 17986, 16987, 19802, 15632, 14273,
10594, 4807, 4551, 4028, 4357, 4067, 4206, 3505, 4137,
3311, 3582, 3586, 4218, 4529, 4360, 6767, 7579, 14462,
24291, 10638, 11250, 29619, 9678, 23322, 18205, 7625, 9891,
5423, 5808, 14438, 17251, 7833, 15226, 7123, 3708, 6135,
4110, 3587, 3222, 3074, 3941, 3846, 3402, 3564, 3425,
4125, 4026, 3689, 4084, 16617, 13577, 6911, 4837, 13553,
10074, 9534, 20824, 4976, 6707, 5396, 8366, 13494, 19766,
11012, 16130, 8521, 8245, 6871, 5977, 8789, 10016, 6517,
8019, 6122, 5465, 5414, 4934, 5788, 6139, 4310, 4144,
11437, 30731, 13741, 27285, 40227, 16320, 23039, 10812, 14686,
27690, 15155, 32701, 18780, 5895, 23348, 6081, 17050, 28498,
35232, 26223, 22341, 15867, 17688, 8580, 24895, 13027, 11223,
7880, 8386, 6988, 5815, 4717, 9088, 8283, 12059, 9161,
6952, 4914, 6652, 4666, 12014, 10703, 16518, 10270, 6724,
4553, 9282, 4981])
使用healpy包,我们可以在摩尔维德投影中绘制这个
[22]:
hp.mollview(counts, nest=True);
为了避免重复编写上述代码,我们可以使用df.healpix_count方法代替:
[23]:
counts = df.healpix_count(healpix_level=6)
hp.mollview(counts, nest=True)
我们可以使用vaex的df.viz.healpix_plot方法,而不是使用healpy:
[24]:
df.viz.healpix_heatmap(f="log1p", healpix_level=6, figsize=(10,8), healpix_output="ecliptic")