These are some informal notes taken while reading about the Python Seaborn package. Seaborn is a wrapper on top of matplotlib that is used for creating common ‘hard to make’ matplotlib plots, and to make them in an aesthetically pleasing matter.

Types of plots

  • sns.boxplot()  – generic boxplot
  • sns.distplot()  – histogram and kernel density estimate (KDE) plotted together
    • sns.distplot(rug=True)  – rugplot
  • sns.kdeplot()  – kernel density estimate plot
    • sns.kdeplot(n_levels)  – set the n_levels parameter high to make the KDE finer
  • sns.rugplot()  – rugplot
  • sns.jointplot()  – show a scatterplot and marginal histogram for two-dimensional data.
    • sns.jointplot(kind='hexbin')  – hexbin plot, like a two-dimensional histogram
    • sns.jointplot(kind='kde') – two-dimensional KDE (might take a while to plot for large datasets)
    • sns.jointplot(kind='reg')  – scatterplot, regression line and confidence interval
    • The sns.jointplot()  function returns a JointPlot object, which you can exploit by saving the result and then adding to it whatever you feel like. Some examples:

  • sns.pairplot()  – used for exploring the relationships between variables in a data frame. By default, plots a scatterplot matrix on off-diagonals and histograms on diagonals. Similar to the R function ggpairs()  in the GGally package.
    • Similar to how jointplot()  returns a JointGrid, pairplot()  returns a PairGrid with its own set of methods available to it. You can use this to change what graphs are plotted:

  • sns.stripplot() – Like a scatterplot, but one of the variables is categorical
    • sns.stripplot(jitter=True)  – stops the points from overlapping as much
  • sns.swarmplot()  – beeswarm plot that works like stripplot()  above, but avoids overlap entirely.
    • sns.swarmplot(hue)  – set the ‘hue’ parameter to use colour to distinguish levels of a variable – e.g. blue for male, red for female
  • sns.violinplot()  – draw a violinplot with a boxplot inside it.
    • sns.violinplot(hue, split=True)  – if the ‘hue’ variable has two levels, then you can spit it so the violin plots won’t be symmetrical
    • sns.violinplot(inner='stick')  – show the individual observations inside the violin plot, rather than a boxplot
  • sns.barplot()  – standard barplot, complete with bootstrapped confidence intervals
  • sns.countplot()  – histogram over a categorical variable, as opposed to the regular histogram which is over a continuous variable
  • sns.pointplot()  – plot the interaction between variables using scatter plot glyphs:
A example pointplot using the Titanic dataset.
  • sns.factorplot()  – draw multiple plots on different facets of your data. Combines plots (like the ones above) with a FacetGrid, which is a subplot grid that comes with a range of methods.
    • sns.factorplot(kind)  – specify the type of your plot. Choose between ‘point’, ‘bar’, ‘count’, ‘box’, ‘violin’ and ‘strip’. ‘Swarm’ seems to work too, at least according to the official tutorial (use a Find search to find the example)
  • sns.regplot()  – plot a scatterplot, simple linear regression line and 95% confidence intervals around the regression line. Accepts x and y variables in a variety of formats. Subset of sns.lmplot()
  • sns.lmplot()  – like sns.regplot() , but requires a data parameter and the column names to plot specified as strings.
    • sns.lmplot(x_jitter)  – add jitter in the x-direction. Useful when making plots where one of the variables takes discrete values.
    • sns.lmplot(x_estimator)  – instead of points, plot an estimate of central tendency (like a mean) and a range
    • sns.lmplot(order)  – fit non-linear trends with a polynomial (applies to regplot too)
    • sns.lmplot(robust=True)  – fit robust regression, down-weighing the impact of outliers
    • sns.lmplot(logisitic=True)  – logistic regression
    • sns.lmplot(lowess=True)  – fit a scatterplot smoother
    • sns.lmplot(hue)  – fit separate regression lines to levels of a categorical variable
    • sns.lmplot(col)  – create facets along levels of a categorical variable
  • sns.residplot()  – fits a simple linear regression, calculates residuals and then plots them
  • sns.heatmap()  – takes rectangular data and plots a heatmap
  • sns.clustermap()  – hierarchically clustered heatmap
  • sns.tsplot()  – time series plotting function. Has the option to include uncertainty, bootstrap resamples, a range of estimators and error bars.
  • sns.lvplot()  – letter value plot, which is like a better boxplot for when you have a high number of data points

Miscellaneous functions

  • sns.get_dataset_names() – list all the toy datasets available on the Seaborn online repository
  • sns.load_dataset()  – load a dataset from the Seaborn online repository
  • sns.FacetGrid , sns.PairGrid , sns.JointGrid  – grids of subplots used for plotting, each somewhat different and each with their own set of methods
  • sns.despine()  – remove top and right axes, making the plot look better

Controlling aesthetics

  • sns.set() – set plotting options to seaborn defaults. Can use to reset plot parameters to the default values.
  • sns.set_style()  – change the default plot theme
  • sns.set_context() – change the default plot context. Used to scale the plots up and down. Options are paper , notebook , talk  and poster , in order from smallest to largest scale.
  • sns.axes_style()  – temporarily set plot parameters, often used for a single plot. For example:

Working with colour

  • sns.color_palette()  – return the list of colours in the current palette
    • The ‘hls’ colour palette is one option; see the list of colours with  sns.palplot(sns.color_palette("hls", 8)) .
    • Another (better) option is the husl system; see the list of colours with  sns.palplot(sns.color_palette("husl", 8))
    • Use ‘Paired’ to access ColorBrewer colours:  sns.palplot(sns.color_palette("Paired")) . Likewise you can put in other parameters; for example,  sns.palplot(sns.color_palette("Set2", 10)) for the “Set2” palette.
    • Tack on ‘_r’ with ColorBrewer palettes to reverse the colour order. Compare the difference between  sns.palplot(sns.color_palette("BuGn_r")) and  sns.palplot(sns.color_palette("BuGn")) .
    • Tack on  ‘_d’ with ColorBrewer palettes to create darker palettes than usual. See  sns.palplot(sns.color_palette("GnBu_d")) compared to  sns.palplot(sns.color_palette("GnBu"))
  • sns.palplot()  – plot colours in a palette in a horizontal array
  • sns.hls_palette()  – more customisation of the ‘hls’ palette
  • sns.husl_palette()  – more customisation of the ‘husl’ palette
  • sns.cubehelix_palette()  – more customisation of the ‘cubehelix’ palette
  • sns.light_palette()  and sns.dark_palette() – sequential palettes for sequential data.
  • sns.diverging_palette()  – pretty self explanatory
  • sns.choose_colorbrewer_palette()  – launch an interactive widget to help you choose ColorBrewer palettes. Must be used in a Jupyter notebook.
  • sns.choose_cubehelix_palette()  – similar to sns.choose_colorbrewer_palette() , but for the cubehelix colour palette.
  • sns.choose_light_palette()  and sns.choose_dark_palette() – launch interactive widget to aid the choice of palette.
  • sns.choose_diverging_palette()  – guess what this does

Using colour palettes

Use the cmap  argument to pass across colour palettes to a Seaborn plotting function:

You can also use the set_palette() function that changes the default matplotlib parameters so the palette is applied to all plots:

2 thoughts on “A quick overview of Seaborn

  1. thanks for this. Do you know how to specify the color of a fit line in distplot? I know the color argument will specify the color of the kde fit, but what about for a gaussian fit if specifying fit=stats.gamma?

    1. I figured out how to change the color. You add this argument: fit_kws={‘color’:’red’}.

Leave a Reply

Your email address will not be published. Required fields are marked *