Skip to content

Seaborn

Seaborn is a high-level interface to matplotlib. It is great for rapidly creating standard yet pretty data visualizations.

Pros:

  • Easy-to-use, high-level
  • Pretty statistical data visualizations by default
  • Theming and color palettes
  • Extendable or modifable with matplotlib code

Cons:

  • Limited support for complex visualizations (in which case, consider matplotlib)

To install, run:

pip install seaborn

Details For use in Jupyter notebooks, make sure you enable matplotlib mode to see the plots. Alternatively, call:

import matplotlib.pyplot as plt
plt.show()

Code

import seaborn as sns
import seaborn.objects as so

import matplotlib as mpl
import matplotlib.pyplot as plt

import numpy as np
import pandas as pd

sns.set_theme()

Theming

If no theme is set, the default matplotlib theme is used.

Action Code Details
Set default theme
sns.set_theme()

Plots

Distribution plots

See https://seaborn.pydata.org/generated/seaborn.displot.html

Boxplots

Action Code Details
Boxplot
sns.boxplot(iris, x='sepal_length')
Boxplot by group
sns.boxplot(iris, x='sepal_length', y='species')
Facetted boxplot by two grouping factors
sns.catplot(fmri, kind='box', x='signal', y='region', col='event')
Boxplot by two grouping factors
sns.boxplot(fmri, x='signal', y='region', hue='event')

Histograms

Action Code Details
Plot histogram
sns.histplot(iris, x='sepal_length')
Plot histogram with bins centered at discrete values
sns.histplot(iris, x='sepal_length', discrete=True)
For example, at integer values
Plot histogram with bin-width
sns.histplot(iris, x='sepal_length', binwidth=1.0)
Plot histogram with a given number of bins
sns.histplot(iris, x='sepal_length', bins=5)
Plot histogram and kernel density
sns.histplot(iris, x='sepal_length', kde=True)
Plot normalized histogram
sns.histplot(iris, x='sepal_length', stat='proportion')
Plot normalized histogram as percentage
sns.histplot(iris, x='sepal_length', stat='percent')
Facetted histogram chart
sns.displot(iris, x='sepal_length', col='species')

Kernel density plots

Action Code Details
Plot kernel density
sns.kdeplot(iris, x='sepal_length')
Plot kernel density by group
sns.kdeplot(iris, x='sepal_length', hue='species')
Facetted kernel density chart
sns.displot(iris, kind='kde', x='sepal_length', col='species')
Violin plot
sns.violinplot(iris, x='sepal_length')
Violin plot with grouping factor
sns.violinplot(iris, x='sepal_length', y='species')

Empirical cumulative density plots

Action Code Details
Plot stepped cumulative density
sns.ecdfplot(iris, x='sepal_length')
Plot stepped inverse cumulative density
sns.ecdfplot(iris, x='sepal_length', complementary=True)
Plot stepped cumulative density by group
sns.displot(iris, kind='ecdf', x='sepal_length', hue='species')
Facetted plot of stepped cumulative density
sns.displot(iris, kind='ecdf', x='sepal_length', col='species')
Plot cumulative density as barplot
sns.histplot(iris, x='sepal_length', cumulative=True, stat='proportion')
Plot cumulative density by group as barplot
sns.histplot(iris, x='sepal_length', hue='species',
    cumulative=True, stat='proportion', common_norm=False)
Each group reaches 1.0
Facetted plot of cumulative density by group as barplot
sns.displot(iris, x='sepal_length',
    cumulative=True, stat='proportion', common_norm=False, col='species')
Each group reaches 1.0
Plot step-wise cumulative density with density rug
sns.displot(iris, kind='ecdf', x='sepal_length', rug=True)

Bivariate plots

Plots involving two continuous variables.

Scatter plots

Action Code Details
Scatter plot
sns.scatterplot(iris, x='sepal_length', y='sepal_width')
Scatter plot with grouping factor
sns.scatterplot(iris, x='sepal_length', y='sepal_width', hue='species')
Facetted scatter plot
sns.relplot(
    data=iris, kind='scatter',
    x='sepal_length', y='sepal_width',
    col='species'
)

Joint plots

Action Code Details
Scatter plot with KDE axes
sns.jointplot(iris, x='sepal_length', y='sepal_width')
Scatter and KDE
Scatter plot with KDE axes, with grouping factor
sns.jointplot(iris, x='sepal_length', y='sepal_width', hue='species')
Plot heatmap with histograms along axes
sns.jointplot(iris, kind='hist', x='sepal_length', y='sepal_width')

Line plots

Action Code Details
Line plot
sns.lineplot(flights.query('month=="May"'), x='year', y='passengers', hue='month')
Line plot, with separate lines per group
sns.lineplot(flights, x='year', y='passengers', hue='month')
Line plot involving duplicate observations per x, as confidence region
sns.lineplot(flights, x='year', y='passengers')
Facetted line plot
sns.relplot(
    data=fmri, kind='line',
    x='timepoint', y='signal',
    col='region', row='event'
)

Multivariate plots

Plots involving more than two continuous variables.

Action Code Details
Pairs plot
sns.pairplot(iris)
Plots pairwise scatter, and KDE along the diagonal

Facetted plotting options

The function for creating facetted plots differs between the kinds of plots, so only options are described here. See, for example, sns.displot(), sns.relplot() and sns.catplot().

Action Code Details
Horizontally stacked facets
col='region'
Horizontally stacked facets, wrap after n columns
col='region', col_wrap=n
Vertically stacked facets
row='region'
Vertically stacked facets, wrap after n columns
row='region', row_wrap=n
Grid-based facets along two grouping factors
col='region', row='event'

Plot configuration

Arguments should be used inside the sns.relplot function.

Action Code Details
Flip axes
orient='y'
Log-scale with base b
log_scale=('y', b)
Format axis with comma as thousands separator
import matplotlib.ticker as ticker
iris['sepal_length'] *= 1000
g = sns.histplot(iris, x='sepal_length')

g.xaxis.set_major_formatter(
    ticker.FuncFormatter(lambda x, p: format(int(x), ','))
)
Very tedious
Format axis with percentage labels
import matplotlib.ticker as ticker
g = sns.ecdfplot(iris, x='sepal_length', stat='percent')

g.yaxis.set_major_formatter( ticker.PercentFormatter() )
PercentFormatter assumes percent input, not proportion, so 1.0 = '1.0%'
1:1 aspect ratio
aspect=1
Hide legend
legend=False