条形图#

除了在连续范围内绘制数值数据外,您还可以使用Bokeh在分类范围内绘制分类数据。

基本的分类范围在Bokeh中表示为字符串序列。例如,四个季节的列表:

seasons = ["Winter", "Spring", "Summer", "Fall"]

Bokeh 也可以处理分层类别。例如,你可以使用嵌套的字符串序列来表示每年季度内的各个月份:

months_by_quarter = [
    ("Q1", "Jan"), ("Q1", "Feb"), ("Q1", "Mar"),
    ("Q2", "Apr"), ("Q2", "May"), ("Q2", "Jun"),
    ("Q3", "Jul"), ("Q3", "Aug"), ("Q3", "Sep"),
    ("Q4", "Oct"), ("Q4", "Nov"), ("Q4", "Dec"),
]

根据您的数据结构,您可以使用不同类型的图表: 条形图、分类热图、抖动图等。本章将 介绍几种常见的分类数据图表类型。

柱状图#

处理分类数据的最常见方法之一是在条形图中展示它。条形图有一个分类轴和一个连续轴。当每个类别有一个值需要绘制时,条形图非常有用。

每个类别的值通过为该类别绘制一个条形来表示。该条形沿连续轴的长度对应于该类别的值。

条形图也可以根据层次子类别进行堆叠或分组。本节将演示如何绘制各种不同的分类条形图。

基础#

要创建一个基本的条形图,请使用hbar()(水平条形)或vbar()(垂直条形)的图形方法。下面的示例展示了一系列简单的单级类别。

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']

要将这些类别分配给x轴,请将此列表作为x_range参数传递给figure()

p = figure(x_range=fruits, ... )

这样做是创建FactorRange对象的一种便捷简写方式。 等效的显式表示法是:

p = figure(x_range=FactorRange(factors=fruits), ... )

当您想要自定义FactorRange时,此表单非常有用,例如,通过更改范围或类别填充。

接下来,调用 vbar(),将水果名称列表作为 x 坐标,将条形高度作为 top 坐标。您还可以指定 width 或其他 可选属性。

p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.9)

结合以上内容产生以下输出:

from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]

p = figure(x_range=fruits, height=350, title="Fruit Counts",
           toolbar_location=None, tools="")

p.vbar(x=fruits, top=counts, width=0.9)

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p)

你也可以将数据分配给ColumnDataSource 并将其作为source参数提供给vbar() 而不是直接将数据作为参数传递。 你将在后面的示例中看到这一点。

排序#

要对给定图表的条形进行排序,请按值对类别进行排序。

下面的示例根据计数按升序对水果类别进行排序,并相应地重新排列条形图。

from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]

# sorting the bars means sorting the range factors
sorted_fruits = sorted(fruits, key=lambda x: counts[fruits.index(x)])

p = figure(x_range=sorted_fruits, height=350, title="Fruit Counts",
           toolbar_location=None, tools="")

p.vbar(x=fruits, top=counts, width=0.9)

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p)

填充#

颜色#

你可以用几种方式为条形图着色:

  • 将所有颜色与其他数据一起提供给ColumnDataSource,并将颜色列的名称分配给color参数的vbar()

    from bokeh.models import ColumnDataSource
    from bokeh.palettes import Bright6
    from bokeh.plotting import figure, show
    
    fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
    counts = [5, 3, 4, 2, 4, 6]
    
    source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, color=Bright6))
    
    p = figure(x_range=fruits, y_range=(0,9), height=350, title="Fruit Counts",
               toolbar_location=None, tools="")
    
    p.vbar(x='fruits', top='counts', width=0.9, color='color', legend_field="fruits", source=source)
    
    p.xgrid.grid_line_color = None
    p.legend.orientation = "horizontal"
    p.legend.location = "top_center"
    
    show(p)
    

    你也可以使用颜色列与line_colorfill_color参数来分别改变轮廓和填充颜色。

  • 使用CategoricalColorMapper模型在浏览器中映射条形图的颜色。 你可以通过factor_cmap()函数来实现这一点。

    factor_cmap('fruits', palette=Spectral6, factors=fruits)
    

    然后,您可以将此函数的结果传递给 color 参数 vbar() 以达到相同的结果:

    from bokeh.models import ColumnDataSource
    from bokeh.palettes import Bright6
    from bokeh.plotting import figure, show
    from bokeh.transform import factor_cmap
    
    fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
    counts = [5, 3, 4, 2, 4, 6]
    
    source = ColumnDataSource(data=dict(fruits=fruits, counts=counts))
    
    p = figure(x_range=fruits, height=350, toolbar_location=None, title="Fruit Counts")
    
    p.vbar(x='fruits', top='counts', width=0.9, source=source, legend_field="fruits",
           line_color='white', fill_color=factor_cmap('fruits', palette=Bright6, factors=fruits))
    
    p.xgrid.grid_line_color = None
    p.y_range.start = 0
    p.y_range.end = 9
    p.legend.orientation = "horizontal"
    p.legend.location = "top_center"
    
    show(p)
    

    有关使用 Bokeh 的颜色映射器的更多信息,请参见 Client-side color mapping

堆叠#

要堆叠垂直条形图,请使用vbar_stack()函数。下面的示例使用了三组水果数据。每组数据对应一年。此示例为每组数据生成一个条形图,并将每种水果的条形元素堆叠在一起。

from bokeh.palettes import HighContrast3
from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 4, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

p = figure(x_range=fruits, height=250, title="Fruit Counts by Year",
           toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")

p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data,
             legend_label=years)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"

show(p)

您还可以堆叠表示正值和负值的条形图:

from bokeh.models import ColumnDataSource
from bokeh.palettes import GnBu3, OrRd3
from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]

exports = {'fruits' : fruits,
           '2015'   : [2, 1, 4, 3, 2, 4],
           '2016'   : [5, 3, 4, 2, 4, 6],
           '2017'   : [3, 2, 4, 4, 5, 3]}
imports = {'fruits' : fruits,
           '2015'   : [-1, 0, -1, -3, -2, -1],
           '2016'   : [-2, -1, -3, -1, -2, -2],
           '2017'   : [-1, -2, -1, 0, -2, -2]}

p = figure(y_range=fruits, height=350, x_range=(-16, 16), title="Fruit import/export, by year",
           toolbar_location=None)

p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports),
             legend_label=[f"{year} exports" for year in years])

p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports),
             legend_label=[f"{year} imports" for year in years])

p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "top_left"
p.axis.minor_tick_line_color = None
p.outline_line_color = None

show(p)

工具提示#

Bokeh 自动将每个图层的 name 属性设置为数据集中其名称。您可以使用 $name 变量在工具提示中显示名称。您还可以使用 @$name 工具提示变量从数据集中检索图层中每个项目的值。

下面的示例展示了这两种行为:

from bokeh.palettes import HighContrast3
from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ["2015", "2016", "2017"]

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 4, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

p = figure(x_range=fruits, height=250, title="Fruit counts by year",
           toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")

p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data,
             legend_label=years)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"

show(p)

你可以通过手动将name的值传递给vbar_stackhbar_stack函数来覆盖它。在这种情况下,$@name将对应于你提供的名称。

hbar_stackvbar_stack 函数返回一个包含所有渲染器的列表(每个条形堆栈一个)。您可以使用此列表来为每一层自定义工具提示。

renderers = p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source,
                         legend=[value(x) for x in years], name=years)

for r in renderers:
    year = r.name
    hover = HoverTool(tooltips=[
        ("%s total" % year, "@%s" % year),
        ("index", "$index")
    ], renderers=[r])
    p.add_tools(hover)

分组#

除了堆叠,您还可以选择将条形分组。根据您的使用情况,您可以通过两种方式实现这一点:

嵌套分类#

如果您提供多个数据子集,Bokeh会自动将条形图分组为带标签的类别,用其代表的子集名称标记每个条形图,并在类别之间添加分隔符。

下面的示例创建了一系列水果-年份对(元组),并通过一次调用vbar()将条形按水果名称分组。

from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 3, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), height=350, title="Fruit Counts by Year",
           toolbar_location=None, tools="")

p.vbar(x='x', top='counts', width=0.9, source=source)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

要为条形应用不同的颜色,请在vbar()函数调用中使用factor_cmap()作为fill_color,如下所示:

p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",

       # use the palette to colormap based on the x[1:2] values
       fill_color=factor_cmap('x', palette=palette, factors=years, start=1, end=2))

在调用factor_cmap()时,start=1end=2 使用 (fruit, year) 对中的年份进行颜色映射。

视觉偏移#

考虑一个场景,其中包含单独的(fruit, year)对序列,而不是单个数据表。您可以通过分别调用vbar()来绘制这些序列。然而,由于每个组中的每个条形都属于相同的fruit类别,条形将会重叠。为了避免这种行为,使用dodge()函数为每次调用vbar()提供一个偏移量。

from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
from bokeh.transform import dodge

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 3, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

source = ColumnDataSource(data=data)

p = figure(x_range=fruits, y_range=(0, 10), title="Fruit Counts by Year",
           height=350, toolbar_location=None, tools="")

p.vbar(x=dodge('fruits', -0.25, range=p.x_range), top='2015', source=source,
       width=0.2, color="#c9d9d3", legend_label="2015")

p.vbar(x=dodge('fruits',  0.0,  range=p.x_range), top='2016', source=source,
       width=0.2, color="#718dbf", legend_label="2016")

p.vbar(x=dodge('fruits',  0.25, range=p.x_range), top='2017', source=source,
       width=0.2, color="#e84d60", legend_label="2017")

p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"

show(p)

堆叠和分组#

您还可以结合上述技术来创建堆叠和分组条形图。以下是一个按季度分组并按地区堆叠条形图的示例:

from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure, show

factors = [
    ("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
    ("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
    ("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
    ("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),

]

regions = ['east', 'west']

source = ColumnDataSource(data=dict(
    x=factors,
    east=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ],
    west=[ 5, 7, 9, 4, 5, 4, 7, 7, 7, 6, 6, 7 ],
))

p = figure(x_range=FactorRange(*factors), height=250,
           toolbar_location=None, tools="")

p.vbar_stack(regions, x='x', width=0.9, alpha=0.5, color=["blue", "red"], source=source,
             legend_label=regions)

p.y_range.start = 0
p.y_range.end = 18
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
p.legend.location = "top_center"
p.legend.orientation = "horizontal"

show(p)

混合因素#

您可以在多级数据结构中使用任何级别来定位字形。

下面的示例将每个月的条形图按财务季度分组,并从Q1Q4在组中心坐标处添加季度平均线。

from bokeh.models import FactorRange
from bokeh.palettes import TolPRGn4
from bokeh.plotting import figure, show

quarters =("Q1", "Q2", "Q3", "Q4")

months = (
    ("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
    ("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
    ("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
    ("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),
)

fill_color, line_color = TolPRGn4[2:]

p = figure(x_range=FactorRange(*months), height=500, tools="",
           background_fill_color="#fafafa", toolbar_location=None)

monthly = [10, 13, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16]
p.vbar(x=months, top=monthly, width=0.8,
       fill_color=fill_color, fill_alpha=0.8, line_color=line_color, line_width=1.2)

quarterly = [13, 9, 13, 14]
p.line(x=quarters, y=quarterly, color=line_color, line_width=3)
p.scatter(x=quarters, y=quarterly, size=10,
          line_color=line_color, fill_color="white", line_width=3)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

使用pandas#

pandas 是一个强大且流行的工具,用于在 Python 中分析表格和时间序列数据。虽然不是必需的,但它可以使使用 Bokeh 更加方便。

例如,您可以使用pandas提供的GroupBy对象来初始化一个ColumnDataSource,并自动为许多统计参数(如组均值和计数)创建列。您还可以将这些GroupBy对象作为range参数传递给figure

from bokeh.palettes import Spectral5
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg as df
from bokeh.transform import factor_cmap

df.cyl = df.cyl.astype(str)
group = df.groupby('cyl')

cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))

p = figure(height=350, x_range=group, title="MPG by # Cylinders",
           toolbar_location=None, tools="")

p.vbar(x='cyl', top='mpg_mean', width=1, source=group,
       line_color=cyl_cmap, fill_color=cyl_cmap)

p.y_range.start = 0
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "some stuff"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None

show(p)

上面的示例按列 'cyl' 对数据进行分组,这就是为什么 ColumnDataSource 包含此列的原因。它还向非分组类别(如 'mpg')添加了相关列, 例如在 'mpg_mean' 列中提供了每加仑的平均英里数。

这也适用于多级分组。下面的示例将相同的数据按 ('cyl', 'mfr') 分组,并显示在沿x轴分布的嵌套类别中。在这里,索引列名 'cyl_mfr' 是通过连接分组列的名称生成的。

from bokeh.palettes import MediumContrast5
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap

df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)

group = df.groupby(['cyl', 'mfr'])

index_cmap = factor_cmap('cyl_mfr', palette=MediumContrast5, factors=sorted(df.cyl.unique()), end=1)

p = figure(width=800, height=300, title="Mean MPG by # Cylinders and Manufacturer",
           x_range=group, toolbar_location=None, tooltips=[("MPG", "@mpg_mean"), ("Cyl, Mfr", "@cyl_mfr")])

p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
       line_color="white", fill_color=index_cmap )

p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None

show(p)

间隔#

你可以使用条形图不仅仅是为了展示具有共同基线的条形图。如果每个类别都有一个起始值和结束值,你也可以使用条形图来表示每个类别在一个范围内的区间。

下面的示例为 hbar() 函数提供了 leftright 属性,以展示多年来奥运会短跑项目中金牌和铜牌得主之间的时间差距。

from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
from bokeh.sampledata.sprint import sprint

df = sprint.copy()  # since we are modifying sampledata

df.Year = df.Year.astype(str)
group = df.groupby('Year')
source = ColumnDataSource(group)

p = figure(y_range=group, x_range=(9.5,12.7), width=400, height=550, toolbar_location=None,
           title="Time Spreads for Sprint Medalists (by Year)")
p.hbar(y="Year", left='Time_min', right='Time_max', height=0.4, source=source)

p.ygrid.grid_line_color = None
p.xaxis.axis_label = "Time (seconds)"
p.outline_line_color = None

show(p)