Tutorial 13: Plotting with iplot

This is another short tutorial that introduces the iplot module, which wraps up the interactive plots we did last week with a simple interface. Make sure you've downloaded the module and import it here:

In [1]:
import iplot

assert iplot.__version__ >= 1
Loading BokehJS ...

The main function that you'll need in the iplot module (for not at least) is called create_figure. Here is its help page:

In [2]:
help(iplot.create_figure)
Help on function create_figure in module iplot:

create_figure(df, x, y, color=None, url=None, title='', x_axis_label=None, y_axis_label=None, nsizes=25)
    Creates an interactive plot from a pandas data frame.
    
    Args:
        df: A pandas data frame.
        x: Name of the x-variable.
        y: Name of the y-variable.
        color: Optional name of the color variable.
        url: Optional name of the url variable.
        title: String giving the title of the plot.
        x_axis_label: String to label to the x-axis with.
        y_axis_label: String to label to the y-axis with.
        nsizes: Maximum number of colors to include in continuous plot.
    Returns:
        A bokeh plot object.

The function takes a pandas DataFrame object and the names (as strings) of the x and y coordinates in the interactive plot. You can also optionally give the name of a variable to map to color and a column that contains a clickable url. Other options set the title and labels of the axes.

To show how this function works, let's load pandas and import a CSV file from my website. This a classic dataset that I use in my intro stats class showing the number of boys and girls born in London each year from 1629 to 1710.

In [3]:
import pandas as pd

df = pd.read_csv("https://statsmaths.github.io/stat_data/arbuthnot.csv")
df.head()
Out[3]:
head_of_state year boys girls total boy_to_girl_ratio
0 Charles I 1629 5218 4683 9901 1.114
1 Charles I 1630 4858 4457 9315 1.090
2 Charles I 1631 4422 4102 8524 1.078
3 Charles I 1632 4994 4590 9584 1.088
4 Charles I 1633 5158 4839 9997 1.066

If we wanted to construct an interactive plot with year on the x-axis and total births on the y-axis, this is the code to use:

In [4]:
p = iplot.create_figure(df, 'year', 'total')
iplot.show(p)

Notice that hovering over a point shows all of the other columns in the dataset.

Adding color

To add color to the plot, specify the name of the color variable. Here is an example of a discrete color:

In [5]:
p = iplot.create_figure(df, 'year', 'total', color='head_of_state')
iplot.show(p)

And here, I'll color it with a continuous variable, the total number of births:

In [6]:
p = iplot.create_figure(df, 'year', 'total', color='total')
iplot.show(p)

Now, set the title and axis labels.

In [7]:
p = iplot.create_figure(df, 'year', 'total', color='total',
                        title='A fun interactive plot!',
                        x_axis_label='Year',
                        y_axis_label='Total number of births')
iplot.show(p)

Adding a URL is similar, but is set up to only work with Wikipedia data. Here, that somewhat works if we set the head of state to the URL because most of them have Wikipedia pages (but may not be exactly the correct page):

In [8]:
p = iplot.create_figure(df, 'year', 'total', color='total', url='head_of_state',
                        title='A fun interactive plot!',
                        x_axis_label='Year',
                        y_axis_label='Total number of births')
iplot.show(p)

Note: I expect the specific functions here to change as we see what works best and what else you might need. I think there is enough here, though, to get us started.