条形图#
除了在连续范围内绘制数值数据外,您还可以使用Bokeh在分类范围内绘制分类数据。
基本的分类范围在Bokeh中表示为字符串序列。例如,四个季节的列表:
seasons = ["Winter", "Spring", "Summer", "Fall"]
Bokeh 也可以处理分层类别。例如,你可以使用嵌套的字符串序列来表示每年季度内的各个月份:
months_by_quarter = [
("Q1", "Jan"), ("Q1", "Feb"), ("Q1", "Mar"),
("Q2", "Apr"), ("Q2", "May"), ("Q2", "Jun"),
("Q3", "Jul"), ("Q3", "Aug"), ("Q3", "Sep"),
("Q4", "Oct"), ("Q4", "Nov"), ("Q4", "Dec"),
]
根据您的数据结构,您可以使用不同类型的图表: 条形图、分类热图、抖动图等。本章将 介绍几种常见的分类数据图表类型。
柱状图#
处理分类数据的最常见方法之一是在条形图中展示它。条形图有一个分类轴和一个连续轴。当每个类别有一个值需要绘制时,条形图非常有用。
每个类别的值通过为该类别绘制一个条形来表示。该条形沿连续轴的长度对应于该类别的值。
条形图也可以根据层次子类别进行堆叠或分组。本节将演示如何绘制各种不同的分类条形图。
基础#
要创建一个基本的条形图,请使用hbar()(水平条形)或vbar()(垂直条形)的图形方法。下面的示例展示了一系列简单的单级类别。
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
要将这些类别分配给x轴,请将此列表作为x_range参数传递给figure()。
p = figure(x_range=fruits, ... )
这样做是创建FactorRange对象的一种便捷简写方式。
等效的显式表示法是:
p = figure(x_range=FactorRange(factors=fruits), ... )
当您想要自定义FactorRange时,此表单非常有用,例如,通过更改范围或类别填充。
接下来,调用 vbar(),将水果名称列表作为
x 坐标,将条形高度作为 top
坐标。您还可以指定 width 或其他
可选属性。
p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.9)
结合以上内容产生以下输出:
from bokeh.plotting import figure, show
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
p = figure(x_range=fruits, height=350, title="Fruit Counts",
toolbar_location=None, tools="")
p.vbar(x=fruits, top=counts, width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)
你也可以将数据分配给ColumnDataSource
并将其作为source参数提供给vbar()
而不是直接将数据作为参数传递。
你将在后面的示例中看到这一点。
排序#
要对给定图表的条形进行排序,请按值对类别进行排序。
下面的示例根据计数按升序对水果类别进行排序,并相应地重新排列条形图。
from bokeh.plotting import figure, show
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
# sorting the bars means sorting the range factors
sorted_fruits = sorted(fruits, key=lambda x: counts[fruits.index(x)])
p = figure(x_range=sorted_fruits, height=350, title="Fruit Counts",
toolbar_location=None, tools="")
p.vbar(x=fruits, top=counts, width=0.9)
p.xgrid.grid_line_color = None
p.y_range.start = 0
show(p)
填充#
颜色#
你可以用几种方式为条形图着色:
将所有颜色与其他数据一起提供给ColumnDataSource,并将颜色列的名称分配给
color参数的vbar()。from bokeh.models import ColumnDataSource from bokeh.palettes import Bright6 from bokeh.plotting import figure, show fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] counts = [5, 3, 4, 2, 4, 6] source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, color=Bright6)) p = figure(x_range=fruits, y_range=(0,9), height=350, title="Fruit Counts", toolbar_location=None, tools="") p.vbar(x='fruits', top='counts', width=0.9, color='color', legend_field="fruits", source=source) p.xgrid.grid_line_color = None p.legend.orientation = "horizontal" p.legend.location = "top_center" show(p)
你也可以使用颜色列与
line_color和fill_color参数来分别改变轮廓和填充颜色。使用
CategoricalColorMapper模型在浏览器中映射条形图的颜色。 你可以通过factor_cmap()函数来实现这一点。factor_cmap('fruits', palette=Spectral6, factors=fruits)
然后,您可以将此函数的结果传递给
color参数vbar()以达到相同的结果:from bokeh.models import ColumnDataSource from bokeh.palettes import Bright6 from bokeh.plotting import figure, show from bokeh.transform import factor_cmap fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] counts = [5, 3, 4, 2, 4, 6] source = ColumnDataSource(data=dict(fruits=fruits, counts=counts)) p = figure(x_range=fruits, height=350, toolbar_location=None, title="Fruit Counts") p.vbar(x='fruits', top='counts', width=0.9, source=source, legend_field="fruits", line_color='white', fill_color=factor_cmap('fruits', palette=Bright6, factors=fruits)) p.xgrid.grid_line_color = None p.y_range.start = 0 p.y_range.end = 9 p.legend.orientation = "horizontal" p.legend.location = "top_center" show(p)
有关使用 Bokeh 的颜色映射器的更多信息,请参见 Client-side color mapping。
堆叠#
要堆叠垂直条形图,请使用vbar_stack()函数。下面的示例使用了三组水果数据。每组数据对应一年。此示例为每组数据生成一个条形图,并将每种水果的条形元素堆叠在一起。
from bokeh.palettes import HighContrast3
from bokeh.plotting import figure, show
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
p = figure(x_range=fruits, height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")
p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data,
legend_label=years)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
您还可以堆叠表示正值和负值的条形图:
from bokeh.models import ColumnDataSource
from bokeh.palettes import GnBu3, OrRd3
from bokeh.plotting import figure, show
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
exports = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
imports = {'fruits' : fruits,
'2015' : [-1, 0, -1, -3, -2, -1],
'2016' : [-2, -1, -3, -1, -2, -2],
'2017' : [-1, -2, -1, 0, -2, -2]}
p = figure(y_range=fruits, height=350, x_range=(-16, 16), title="Fruit import/export, by year",
toolbar_location=None)
p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports),
legend_label=[f"{year} exports" for year in years])
p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports),
legend_label=[f"{year} imports" for year in years])
p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "top_left"
p.axis.minor_tick_line_color = None
p.outline_line_color = None
show(p)
工具提示#
Bokeh 自动将每个图层的 name 属性设置为数据集中其名称。您可以使用 $name 变量在工具提示中显示名称。您还可以使用 @$name 工具提示变量从数据集中检索图层中每个项目的值。
下面的示例展示了这两种行为:
from bokeh.palettes import HighContrast3
from bokeh.plotting import figure, show
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
p = figure(x_range=fruits, height=250, title="Fruit counts by year",
toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")
p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data,
legend_label=years)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
你可以通过手动将name的值传递给vbar_stack或hbar_stack函数来覆盖它。在这种情况下,$@name将对应于你提供的名称。
hbar_stack 和 vbar_stack 函数返回一个包含所有渲染器的列表(每个条形堆栈一个)。您可以使用此列表来为每一层自定义工具提示。
renderers = p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source,
legend=[value(x) for x in years], name=years)
for r in renderers:
year = r.name
hover = HoverTool(tooltips=[
("%s total" % year, "@%s" % year),
("index", "$index")
], renderers=[r])
p.add_tools(hover)
分组#
除了堆叠,您还可以选择将条形分组。根据您的使用情况,您可以通过两种方式实现这一点:
嵌套分类#
如果您提供多个数据子集,Bokeh会自动将条形图分组为带标签的类别,用其代表的子集名称标记每个条形图,并在类别之间添加分隔符。
下面的示例创建了一系列水果-年份对(元组),并通过一次调用vbar()将条形按水果名称分组。
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure, show
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack
source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), height=350, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
要为条形应用不同的颜色,请在vbar()函数调用中使用factor_cmap()作为fill_color,如下所示:
p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",
# use the palette to colormap based on the x[1:2] values
fill_color=factor_cmap('x', palette=palette, factors=years, start=1, end=2))
在调用factor_cmap()时,start=1 和 end=2 使用 (fruit, year) 对中的年份进行颜色映射。
视觉偏移#
考虑一个场景,其中包含单独的(fruit, year)对序列,而不是单个数据表。您可以通过分别调用vbar()来绘制这些序列。然而,由于每个组中的每个条形都属于相同的fruit类别,条形将会重叠。为了避免这种行为,使用dodge()函数为每次调用vbar()提供一个偏移量。
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
from bokeh.transform import dodge
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']
data = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 3, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
source = ColumnDataSource(data=data)
p = figure(x_range=fruits, y_range=(0, 10), title="Fruit Counts by Year",
height=350, toolbar_location=None, tools="")
p.vbar(x=dodge('fruits', -0.25, range=p.x_range), top='2015', source=source,
width=0.2, color="#c9d9d3", legend_label="2015")
p.vbar(x=dodge('fruits', 0.0, range=p.x_range), top='2016', source=source,
width=0.2, color="#718dbf", legend_label="2016")
p.vbar(x=dodge('fruits', 0.25, range=p.x_range), top='2017', source=source,
width=0.2, color="#e84d60", legend_label="2017")
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
堆叠和分组#
您还可以结合上述技术来创建堆叠和分组条形图。以下是一个按季度分组并按地区堆叠条形图的示例:
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure, show
factors = [
("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
]
regions = ['east', 'west']
source = ColumnDataSource(data=dict(
x=factors,
east=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ],
west=[ 5, 7, 9, 4, 5, 4, 7, 7, 7, 6, 6, 7 ],
))
p = figure(x_range=FactorRange(*factors), height=250,
toolbar_location=None, tools="")
p.vbar_stack(regions, x='x', width=0.9, alpha=0.5, color=["blue", "red"], source=source,
legend_label=regions)
p.y_range.start = 0
p.y_range.end = 18
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
p.legend.location = "top_center"
p.legend.orientation = "horizontal"
show(p)
混合因素#
您可以在多级数据结构中使用任何级别来定位字形。
下面的示例将每个月的条形图按财务季度分组,并从Q1到Q4在组中心坐标处添加季度平均线。
from bokeh.models import FactorRange
from bokeh.palettes import TolPRGn4
from bokeh.plotting import figure, show
quarters =("Q1", "Q2", "Q3", "Q4")
months = (
("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
)
fill_color, line_color = TolPRGn4[2:]
p = figure(x_range=FactorRange(*months), height=500, tools="",
background_fill_color="#fafafa", toolbar_location=None)
monthly = [10, 13, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16]
p.vbar(x=months, top=monthly, width=0.8,
fill_color=fill_color, fill_alpha=0.8, line_color=line_color, line_width=1.2)
quarterly = [13, 9, 13, 14]
p.line(x=quarters, y=quarterly, color=line_color, line_width=3)
p.scatter(x=quarters, y=quarterly, size=10,
line_color=line_color, fill_color="white", line_width=3)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
使用pandas#
pandas 是一个强大且流行的工具,用于在 Python 中分析表格和时间序列数据。虽然不是必需的,但它可以使使用 Bokeh 更加方便。
例如,您可以使用pandas提供的GroupBy对象来初始化一个ColumnDataSource,并自动为许多统计参数(如组均值和计数)创建列。您还可以将这些GroupBy对象作为range参数传递给figure。
from bokeh.palettes import Spectral5
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg as df
from bokeh.transform import factor_cmap
df.cyl = df.cyl.astype(str)
group = df.groupby('cyl')
cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))
p = figure(height=350, x_range=group, title="MPG by # Cylinders",
toolbar_location=None, tools="")
p.vbar(x='cyl', top='mpg_mean', width=1, source=group,
line_color=cyl_cmap, fill_color=cyl_cmap)
p.y_range.start = 0
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "some stuff"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
上面的示例按列 'cyl' 对数据进行分组,这就是为什么
ColumnDataSource 包含此列的原因。它还向非分组类别(如 'mpg')添加了相关列,
例如在 'mpg_mean' 列中提供了每加仑的平均英里数。
这也适用于多级分组。下面的示例将相同的数据按 ('cyl', 'mfr') 分组,并显示在沿x轴分布的嵌套类别中。在这里,索引列名 'cyl_mfr' 是通过连接分组列的名称生成的。
from bokeh.palettes import MediumContrast5
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
group = df.groupby(['cyl', 'mfr'])
index_cmap = factor_cmap('cyl_mfr', palette=MediumContrast5, factors=sorted(df.cyl.unique()), end=1)
p = figure(width=800, height=300, title="Mean MPG by # Cylinders and Manufacturer",
x_range=group, toolbar_location=None, tooltips=[("MPG", "@mpg_mean"), ("Cyl, Mfr", "@cyl_mfr")])
p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
line_color="white", fill_color=index_cmap )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)
间隔#
你可以使用条形图不仅仅是为了展示具有共同基线的条形图。如果每个类别都有一个起始值和结束值,你也可以使用条形图来表示每个类别在一个范围内的区间。
下面的示例为 hbar() 函数提供了 left 和
right 属性,以展示多年来奥运会短跑项目中金牌和铜牌得主之间的时间差距。
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
from bokeh.sampledata.sprint import sprint
df = sprint.copy() # since we are modifying sampledata
df.Year = df.Year.astype(str)
group = df.groupby('Year')
source = ColumnDataSource(group)
p = figure(y_range=group, x_range=(9.5,12.7), width=400, height=550, toolbar_location=None,
title="Time Spreads for Sprint Medalists (by Year)")
p.hbar(y="Year", left='Time_min', right='Time_max', height=0.4, source=source)
p.ygrid.grid_line_color = None
p.xaxis.axis_label = "Time (seconds)"
p.outline_line_color = None
show(p)