12. Interactive Data Visualizations and Dashboards#
12.1. Introduction: Why Make Visualizations Interactive?#
There’s a great moment in Jurassic Park when Dr. Ian Malcomb is speaking to John Hammond, proprietor of Jurassic Park and owner of the company that clones dinosaurs.
from IPython.display import IFrame
IFrame(src="https://www.youtube.com/embed/4PLvdmifDSk", width="560", height="315")
Hammond says to Malcomb:
I don’t think you’re giving us our due credit, our scientists have done things which nobody’s ever done before.
And Malcomb replies:
Yeah, but your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.
Putting aside Malcomb’s attack on basic tenets of scientific methodology (he criticizes the project for citing previous research as if it is theft), there is a point here regarding data visualization or any new tool for data analysis. Just because something looks impressive does not necessarily mean that the tool is a good choice for the purposes we intend to use the tool for. We can make data visualizations that are interactive and pre-loaded with many clickable features, and we can host these visualizations on sleek websites. Interactive visualizations are impressive, but that does not mean that they are more effective than static visualizations generated by matplotlib
, pandas
, or seaborn
at revealing important properties of data or at communicating a finding to a general audience in an understandable and memorable way. If we generate an interactive visualization, there should be a compelling reason why the graphic should be interactive: we need to use interactivity to achieve a purpose that we cannot achieve with a static visualization. In that way, interactivity is like any other aesthetic in a figure. Edward Tufte calls aesthetics with no purpose chartjunk, and interactivity for the sake of interactivity is chartjunk.
So what can an interactive visualization accomplish that a static visualization cannot?
First, interactive visualizations can make annotations more useful and less overwhelming. If we generate a barplot, for example, we can label each bar with the number that the bar represents. But if there are many bars, then we would need to include many numbers in a single barplot. In a static visualization, the numbers can overwhelm the audience as there is a lot more data to try to understand. Interactively, it is possible to hide or reveal these annotations based on the user’s preferences at any particular moment. All of these numeric labels can be hidden until a user hovers the mouse over a bar, and at that point the annotation pops up in a separate box. That allows the visualization to display only the data that the user wants to see, while hiding the rest of the data. Because the data are so carefully curated in this way, it is possible to include many more annotations in an interactive barplot than in a static one. Interactivity also makes maps much more useful because data regarding a geographic location can be displayed when the mouse hovers over that area of the map without being limited by the different sizes of geographic areas.
Second, interactive graphics give a user the ability to zoom in on a portion of the visualization. Some graphs, such as barplots, have little or no need for a zoom function. But zooming is very useful for other graphics, especially scatterplots in which each point has an annotation that appears when the mouse hovers over the dot. Zooming is also useful for lineplots in which a user wants to examine a subset of the x-axis or reduce the y-axis to emphasize changes in the features.
Third, interactivity can be used to generate animations. Animations are useful for illustrating changes over time. However, animations are poor ways to represent other kinds of data because they cannot show the entirety of the data at once.
Finally, interactive graphics allow users to input parameters to generate a new visualization on the fly. For example, if the data have 10 continuous-valued features, then we can create an interactive graph that allows the user to choose the features on the x and y-axes from drop down menus. This type of user-guided visualization is not appropriate if we are trying to present a specific set of findings to the users, but it is worthwhile if our purpose is to allow the users to more easily explore the data.
If there is no need for pop-out annotations, zooming, animation, or user-supplied inputs, then static visualizations are better than interactive ones because we can exert more control over the appearance of the visualization and because we can write about what the figure shows with confidence that every member of our audience sees the same visualization. Like Ian Malcomb says, just because we can include interactivity doesn’t mean we should. Interactivity adds complexity to a figure, so there needs to be a compelling justification for interactivity.
One interactive visualization we will NOT discuss is a 3D visualization. In chapter 26 of Fundamentals of Data Visualization, Claus O. Wilke demonstrates that “the projection of 3D objects into two dimensions for printing or display on a monitor distorts the data.” Because the orientation of the camera in a 3D image is arbitrary, objects like bars can appear larger or smaller than they actually are depending on the perspective in the image, and objects can appear to be higher or lower along a meaningful axis than they should be. 3D images can almost always be more accurately and simply visualized with a grid of 2D images.
12.2. Creating Interactive Data Visualizations with plotly
#
The most widely used package for creating interactive data visualizations in both R and Python is plotly
. In Python, there are several versions of plotly
, contained in different modules of the plotly
package. We will be primarily using the plotly.express
module, which we will alias px
, but other graphics use the plotly.graph_objects
module aliased as go
. Here we load both modules along with pandas
and numpy
:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
#import plotly.offline as pyo
#pyo.init_notebook_mode() ## ensures that the plotly graphics convert to HTML
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 3
1 import numpy as np
2 import pandas as pd
----> 3 import plotly.graph_objects as go
4 import plotly.express as px
5 #import plotly.offline as pyo
6 #pyo.init_notebook_mode() ## ensures that the plotly graphics convert to HTML
ModuleNotFoundError: No module named 'plotly'
The difference between plotly.graph_objects
and plotly.express
is similar to the difference between matplotlib
and seaborn
. plotly.express
is a wrapper for plotly.graph_objects
just like seaborn
is a wrapper for matplotlib
, and like seaborn
, plotly.express
is designed to produce visualizations that are prettier by default and use less code to generate.
We will once again be using the 2019 American National Election pilot study to demonstrate interactive visualizations:
anes = pd.read_csv("https://github.com/jkropko/DS-6001/raw/master/localdata/anes_pilot2019_clean.csv")
anes.columns
Index(['caseid', 'liveurban', 'vote16', 'protest', 'vote',
'most_important_issue', 'confecon', 'ideology', 'partyID',
'universal_income', 'family_separation', 'free_college',
'forgive_loans', 'race', 'birthyr', 'sex', 'education', 'weight',
'fttrump', 'ftobama', 'ftbiden', 'ftwarren', 'ftsanders', 'ftbuttigieg',
'ftharris', 'ftblack', 'ftwhite', 'fthisp', 'ftasian', 'ftmuslim',
'ftillegal', 'ftjournal', 'ftnato', 'ftun', 'ftice', 'ftnra', 'ftchina',
'ftnkorea', 'ftmexico', 'ftsaudi', 'ftukraine', 'ftiran', 'ftbritain',
'ftgermany', 'ftjapan', 'ftisrael', 'ftfrance', 'ftcanada', 'ftturkey',
'ftrussia', 'ftpales', 'ftimmig', 'partisanship', 'ftbiden_level',
'age', 'age2', 'ftbiden_float', 'ftbiden_cat', 'ftbiden_str',
'prefersbiden', 'worried_econ', 'favor_both'],
dtype='object')
In this module, we will also generate state maps. The following code extracts the data on which state each individual lives in and merges this information into the ANES data:
%%capture
anes_state = pd.read_csv("https://github.com/jkropko/DS-6001/raw/master/localdata/anes_pilot_2019.csv")
anes_state = anes_state[['caseid', 'inputstate']]
anes_state['state'] = anes_state['inputstate'].map({1:'Alabama',2:'Alaska',60:'American Samoa',
3:'American Samoa',4:'Arizona',5:'Arkansas',
81:'Baker Island',6:'California',7:'Canal Zone',
8:'Colorado',9:'Connecticut',10:'Delaware',
11:'District of Columbia',12:'Florida',
64:'Federated States of Micronesia',13:'Georgia',
14:'Guam',66:'Guam',15:'Hawaii',84:'Howland Island',
16:'Idaho',17:'Illinois',18:'Indiana',19:'Iowa',
86:'Jarvis Island',67:'Johnston Atoll',20:'Kansas',
21:'Kentucky',89:'Kingman Reef',22:'Louisiana',
23:'Maine',68:'Marshall Islands',24:'Maryland',
25:'Massachusetts',26:'Michigan',71:'Midway Islands',
27:'Minnesota',28:'Mississippi',29:'Missouri',
30:'Montana',76:'Navassa Island',31:'Nebraska',
32:'Nevada',33:'New Hampshire',34:'New Jersey',
35:'New Mexico',36:'New York',37:'North Carolina',
38:'North Dakota',69:'Northern Mariana Islands',
39:'Ohio',40:'Oklahoma',41:'Oregon',70:'Palau',
95:'Palmyra Atoll',42:'Pennsylvania',43:'Puerto Rico',
72:'Puerto Rico',44:'Rhode Island',45:'South Carolina',
46:'South Dakota',47:'Tennessee',48:'Texas',
74:'U.S. Minor Outlying Islands',49:'Utah',
50:'Vermont',51:'Virginia',
52:'Virgin Islands of the U.S.',
78:'Virgin Islands of the U.S.',79:'Wake Island',
53:'Washington',54:'West Virginia',55:'Wisconsin',
56:'Wyoming'})
anes_state['state_abb'] = anes_state['inputstate'].map({1:'AL',2:'AK',60:'AS',3:'AS',4:'AZ',5:'AR',
81:'UM',6:'CA',7:'CZ',8:'CO',9:'CT',10:'DE',
11:'DC',12:'FL',64:'FM',13:'GA',
14:'GU',66:'GU',15:'HI',84:'UM',
16:'ID',17:'IL',18:'IN',19:'IA',
86:'UM',67:'UM',20:'KS', 21:'KY',89:'UM',22:'LA',
23:'ME',68:'UM',24:'MD',25:'MA',26:'MI',71:'UM',
27:'MN',28:'MS',29:'MO',30:'MT',76:'UM',31:'NE',
32:'NV',33:'NH',34:'NJ',35:'NM',36:'NY',37:'NC',
38:'ND',69:'MP',39:'OH',40:'OK',41:'OR',70:'PW',
95:'Palmyra Atoll',42:'PA',43:'PR',72:'PR',44:'RI',45:'SC',
46:'SD',47:'TN',48:'TX',74:'UM',49:'UT',
50:'VT',51:'VA',52:'VI',78:'VI',79:'UM',
53:'WA',54:'WV',55:'WI',56:'WY'})
anes_state = anes_state.rename({'inputstate':'stateID'}, axis=1)
anes = pd.merge(anes, anes_state, on='caseid', validate='one_to_one')
12.2.1. Barplots and How to Use plotly
Graphics#
As with matplotlib
, the pandas
.plot()
method, and seaborn
, to create a barplot we must first generate a dataframe that contains the categories and the values we intend to plot. The following code gives us a small dataframe with the vote choices of the people in the ANES as of December 2019, and the frequency of each choice:
anes_bar = anes.vote.value_counts().reset_index()
anes_bar
index | vote | |
---|---|---|
0 | Joe Biden | 1288 |
1 | Donald Trump | 1273 |
2 | Someone else | 321 |
3 | Probably will not vote | 283 |
To generate an interactive barplot, use the px.barplot()
function. This function requires a dataframe and an x
and y
feature. If we set the categories to x
and the counts to y
then we generate a vertically-oriented barplot:
px.bar(anes_bar, x='index', y='vote')
/opt/anaconda3/lib/python3.7/site-packages/IPython/utils/traitlets.py:5: UserWarning:
IPython.utils.traitlets has moved to a top-level traitlets package.
Take a moment to notice the various ways in which this barplot is different than the static barplots we created in the previous chapter. First, when we hover the mouse over the bars, the name of the category and the specific count appear in a box at the top of the bar. Second there are many icons in the upper-right corner of the plot. These icons, listed here from left-to-right, have the following functions:
The camera allows a user to download the image as a .png file. Downloading the image is like taking a screenshot: it removes all of the interactive elements of the figure. We download whatever version of the figure is currently being displayed, so if we made any changes like zooming in on a specific region, these changes carry over to the download.
The next four icons - the magnifying glass, the cross made of arrows, the dashed square, and the oval with a tail - are tools that the user can employ to change how the figure is displayed. The magnifying glass allows the user to click and drag a rectangle onto the figure to zoom in on that rectangle. The cross made of arrows allows the user to click and drag the image to pan to different parts of the graph (which becomes useful if the graph is already zoomed in on a region). The dashed rectangle and the oval are selection tools that allow a user to highlight elements of the graph, like a particular bar, within either a rectangular or a user-drawn region.
The plus sign zooms in on the center of the graph, and the minus sign zooms out.
The bracketed, crossed arrows sets the zoom at a level that is automatically selected to frame the entirety of the data in a way that removes most of the marginal space around the graph, and the house resets the zoom to the default level that appears when the figure is first generated.
The right-angle with dotted lines and a point turn on “spike lines”: there will be dotted lines that connect the user’s mouse to the corresponding positions on the x and y-axis. This feature is especially useful for scatterplots when we want to see the exact x and y-coordinates of a point.
The single or double rectangles with points on their left sides allow a user to choose either “Show closest data on hover”, which is the default, or “Compare data on hover”. “Show closest data on hover” displays annotations only when the user’s mouse hovers directly over a data element in the graph. If the mouse is not directly touching an element like a bar, line, or point, then no data will be displayed. In contrast, “Compare data on hover” always shows the annotations for the closest element on the categorical axis for bars, or on both axes for scatterplots and line plots.
Finally, the barplot icon on the right-hand side links to https://plotly.com/.
Try these features out on the graphs that appear in this notebook.
We can color-code the bars by setting the color
argument equal to the categorical feature in the barplot We can also add axis labels to the plotly
barplot by specifying a dictionary that maps the features to labels and passing this dictionary to the labels
argument, and we can add a title by passing a string to the title
argument:
px.bar(anes_bar, x='index', y='vote', color='index',
labels={'vote':'Number of voters', 'index':'Vote choice'},
title = 'Vote choice as of December 2019')
If we want greater control over the appearance of the figure, we need to set the figure equal to a Python variable and use methods to add additional aesthetics or make edits to the figure. FOr example, titles are left-justified in plotly
figures by default, but we can center the title by typing fig.update(layout=dict(title=dict(x=0.5)))
after creating the figure and saving it as fig
. When we use colors, a legend will appear automatically, but if we don’t want a legend, we can turn it off with fig.update_layout(showlegend=False)
. Finally, to display the figure, we use the .show()
method:
fig = px.bar(anes_bar, x='index', y='vote', color='index',
labels={'vote':'Number of voters', 'index':'Vote choice'},
title = 'Vote choice as of December 2019')
fig.update_layout(showlegend=False)
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
With plotly
we can define hover data: a set of features whose values are displayed when the user’s mouse hovers over an element of the graph. We can add many additional features to the graph with hover data. For example, the following code creates a dataframe with the total votes for each candudate, the average feeling thermometer ratings for Biden and Trump by group, and the vote percent for each candidate:
anes_bar = anes.groupby('vote', sort=False).agg({'vote':'size',
'ftbiden':'mean',
'fttrump':'mean'})
anes_bar = anes_bar.rename({'vote':'votes'}, axis=1) #needed to avoid the same name as the index
anes_bar['ftbiden'] = round(anes_bar['ftbiden'],2)
anes_bar['fttrump'] = round(anes_bar['fttrump'],2)
anes_bar = anes_bar.reset_index()
anes_bar['percent'] = round(100*anes_bar['votes']/sum(anes_bar['votes']),2)
anes_bar
vote | votes | ftbiden | fttrump | percent | |
---|---|---|---|---|---|
0 | Joe Biden | 1288 | 70.72 | 8.45 | 40.70 |
1 | Donald Trump | 1273 | 15.74 | 87.84 | 40.22 |
2 | Probably will not vote | 283 | 39.60 | 28.75 | 8.94 |
3 | Someone else | 321 | 34.02 | 23.13 | 10.14 |
In the following figure, we can generate a barplot to illustrate the percents for the candidates, but we can use hover data to annotate the bars with the votes and average feeling thermometers for each candidate. We pass these additional features as a list to the hover_data
argument. Take a moment to hover your mouse over the bars in the following figure and look at the data that appear:
fig = px.bar(anes_bar, x='vote', y='percent', color='vote',
labels={'vote':'Vote choice', 'percent':'Percent'},
title = 'Vote choice as of December 2019',
hover_data = ['votes', 'ftbiden', 'fttrump'])
fig.update_layout(showlegend=False)
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
These boxes only show us the additional features for one bar at a time, which allows us to include many more features as annotations than we could include in a static barplot without overwhelming the user.
To generate a horizontal barplot, set the categorical feature as y
and the feature that defines the height of each bar as x
:
fig = px.bar(anes_bar, y='vote', x='percent', color='vote',
labels={'vote':'Vote choice', 'percent':'Percent'},
title = 'Vote choice as of December 2019',
hover_data = ['votes', 'ftbiden', 'fttrump'])
fig.update_layout(showlegend=False)
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
Even though the bars can be annotated with hover data, we might still want to label the bars directly. Annotating bars in plotly
is much easier than labeling the bars in matplotlib
, pandas
, or seaborn
. plotly.express
plotting functions include an argument text
that allows us to label the elements with values of another feature. In the following example, we create a feature “text” that contains the percents, converted to strings and with a % sign attached. We then pass this feature to the text
argument to label the bars. The labels by default are placed within the bars, at the top, and centered:
anes_bar['text'] = anes_bar['percent'].astype(str) + '%'
fig = px.bar(anes_bar, x='vote', y='percent', color='vote',
labels={'vote':'Vote choice', 'percent':'Percent'},
title = 'Vote choice as of December 2019',
hover_data = ['votes', 'ftbiden', 'fttrump'],
text='text')
fig.update_layout(showlegend=False)
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
To demonstrate how to create grouped and faceted barplots using Python, we can use the following dataframe, which groups the anes
data by vote choice and party affiliation, and collects the column and row percents - the percent within each party that chooses each voting option, and the percent within each voter group that belongs to each party - as well as the count and mean Biden and Trump thermometers within each group:
colpercent = round(100*pd.crosstab(anes.vote, anes.partyID, normalize='columns'),2).reset_index()
colpercent = pd.melt(colpercent, id_vars = 'vote', value_vars = ['Democrat', 'Republican', 'Independent'])
colpercent = colpercent.rename({'value':'colpercent'}, axis=1)
rowpercent = round(100*pd.crosstab(anes.vote, anes.partyID, normalize='index'),2).reset_index()
rowpercent = pd.melt(rowpercent, id_vars = 'vote', value_vars = ['Democrat', 'Republican', 'Independent'])
rowpercent = rowpercent.rename({'value':'rowpercent'}, axis=1)
votes = pd.crosstab(anes.vote, anes.partyID).reset_index()
votes = pd.melt(votes, id_vars = 'vote', value_vars = ['Democrat', 'Republican', 'Independent'])
votes = votes.rename({'value':'votes'}, axis=1)
ftb = pd.crosstab(anes.vote, anes.partyID, values=anes.ftbiden, aggfunc='mean').round(2).reset_index()
ftb = pd.melt(ftb, id_vars = 'vote', value_vars = ['Democrat', 'Republican', 'Independent'])
ftb = ftb.rename({'value':'Biden thermometer'}, axis=1)
ftt = pd.crosstab(anes.vote, anes.partyID, values=anes.fttrump, aggfunc='mean').round(2).reset_index()
ftt = pd.melt(ftt, id_vars = 'vote', value_vars = ['Democrat', 'Republican', 'Independent'])
ftt = ftt.rename({'value':'Trump thermometer'}, axis=1)
anes_groupbar = pd.merge(colpercent, rowpercent, on=['vote', 'partyID'], validate='one_to_one')
anes_groupbar = pd.merge(anes_groupbar, votes, on=['vote', 'partyID'], validate='one_to_one')
anes_groupbar = pd.merge(anes_groupbar, ftb, on=['vote', 'partyID'], validate='one_to_one')
anes_groupbar = pd.merge(anes_groupbar, ftt, on=['vote', 'partyID'], validate='one_to_one')
anes_groupbar['coltext'] = anes_groupbar['colpercent'].astype(str) + '%'
anes_groupbar['rowtext'] = anes_groupbar['rowpercent'].astype(str) + '%'
anes_groupbar
vote | partyID | colpercent | rowpercent | votes | Biden thermometer | Trump thermometer | coltext | rowtext | |
---|---|---|---|---|---|---|---|---|---|
0 | Donald Trump | Democrat | 4.23 | 4.44 | 56 | 41.30 | 74.60 | 4.23% | 4.44% |
1 | Joe Biden | Democrat | 81.27 | 84.86 | 1076 | 72.10 | 6.79 | 81.27% | 84.86% |
2 | Probably will not vote | Democrat | 5.14 | 30.09 | 68 | 46.21 | 22.85 | 5.14% | 30.09% |
3 | Someone else | Democrat | 9.37 | 42.03 | 124 | 37.84 | 14.89 | 9.37% | 42.03% |
4 | Donald Trump | Republican | 88.22 | 83.74 | 1056 | 14.66 | 89.04 | 88.22% | 83.74% |
5 | Joe Biden | Republican | 5.18 | 4.89 | 62 | 66.56 | 23.85 | 5.18% | 4.89% |
6 | Probably will not vote | Republican | 2.09 | 11.06 | 25 | 37.29 | 37.96 | 2.09% | 11.06% |
7 | Someone else | Republican | 4.51 | 18.31 | 54 | 36.72 | 38.00 | 4.51% | 18.31% |
8 | Donald Trump | Independent | 28.17 | 11.82 | 149 | 13.22 | 84.93 | 28.17% | 11.82% |
9 | Joe Biden | Independent | 24.57 | 10.25 | 130 | 63.17 | 14.84 | 24.57% | 10.25% |
10 | Probably will not vote | Independent | 25.14 | 58.85 | 133 | 39.28 | 30.52 | 25.14% | 58.85% |
11 | Someone else | Independent | 22.12 | 39.66 | 117 | 28.10 | 24.67 | 22.12% | 39.66% |
We now have two categorical features to plot - vote choice and party affiliation - and several features to define the height of bars and to include as hover data. To create bars that are grouped for each candidate and color-coded by party, with bars for the same candidate placed side-by-side, we can set y
to colpercent
and color
to partyID
, and we can set barmode='group'
. We also include the thermometers and the total votes in each bar as hover data, and we annotate the bars with text
:
fig = px.bar(anes_groupbar, x='vote', y='colpercent', color='partyID',
labels={'vote':'Vote choice', 'colpercent':'Percent'},
title = 'Vote choice as of December 2019',
hover_data = ['votes', 'Biden thermometer', 'Trump thermometer'],
text='coltext',
barmode = 'group')
fig.update_layout(showlegend=True)
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
Stacking is another way to group bars that places one bar directly on top of another. To stack bars, we can change the barmode
argument to 'stack'
. In the following barplot, we orient the bars horizontally, we plot the party affiliations on the y-axis and the row percents (the within-party breakdown across candidates) on the x-axis, and we stack the bars by vote choice. We also annotate the bars with the percents and include the count of each group and the mean thermometer ratings as hover data:
fig = px.bar(anes_groupbar, y='vote', x='rowpercent', color='partyID',
labels={'vote':'Vote choice', 'rowpercent':'Percent'},
title = 'Vote choice as of December 2019',
hover_data = ['votes', 'Biden thermometer', 'Trump thermometer'],
text='rowtext',
barmode = 'stack')
fig.update_layout(showlegend=True)
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
Instead of grouping bars side-by-side or stacking them, we can use faceting to show the data for different groups. To create three barplots of vote choice, one for each party affiliation, and to arrange these barplots in one row a grid, we can set facet_col
equal to partyID
. By default, the different graphs have subtitles such as partyID=Democrat
. To change these subtitles, we can use the .for_each_annotation()
method with a lambda
function that replaces the string partyID=
with nothing, leaving only the category labels:
fig = px.bar(anes_groupbar, x='vote', y='colpercent', color='partyID',
facet_col='partyID',
hover_data = ['votes', 'Biden thermometer', 'Trump thermometer'],
labels={'vote':'Vote choice', 'colpercent':'Percent'},
title = 'Vote choice as of December 2019',
text='coltext')
fig.update(layout=dict(title=dict(x=0.5)))
fig.update_layout(showlegend=False)
fig.for_each_annotation(lambda a: a.update(text=a.text.replace("partyID=", "")))
fig.show()
If we add the facet_col_wrap
argument, we can specify how many graphs to include on one row before moving to the next row. The following figure includes two graphs per row. We also adjust the height and width of the figure with the height
and width
arguments:
fig = px.bar(anes_groupbar, x='partyID', y='rowpercent', color='partyID',
facet_col='vote', facet_col_wrap=2,
hover_data = ['votes', 'Biden thermometer', 'Trump thermometer'],
labels={'partyID':'Party Identification', 'rowpercent':'Percent'},
title = 'Vote choice as of December 2019',
text='rowtext', width=1000, height=600)
fig.update(layout=dict(title=dict(x=0.5)))
fig.update_layout(showlegend=True)
fig.for_each_annotation(lambda a: a.update(text=a.text.replace("vote=", "")))
fig.show()
12.2.2. Scatterplots#
The plotly.express
syntax for scatterplots is very similar to the syntax for barplots. We use the px.scatter()
function, and we define the features for the x and y-axes, and we can adjust the height and width of the figure, the axis labels, the hover data, and the title in exactly the same way we did for barplots. Here is a scatterplot of the Biden and Trump thermometers for the first 200 rows of the anes
data:
fig = px.scatter(anes.head(200), x='ftbiden', y='fttrump',
height=600, width=600,
labels={'ftbiden':'Joe Biden thermometer rating',
'fttrump':'Donald Trump thermometer rating'},
hover_data=['partyID', 'sex', 'state'],
title = 'Trump vs. Biden Feeling Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
Notice that the hover data appears above each point only when the user’s mouse hovers over the point. We can use this functionality to learn about outliers in the data. For example, there is only one person in this subset of the data who rates both Trump and Biden above 60. When we hover over this point, we can see that this person is a male Democrat from California who rates Biden at 65 and Trump at 78.
We can color-code the points by setting the color
argument equal to a categorical feature. This feature, however, may not have any missing values. We must either recode or impute the data so that there are no missing values, or we need to delete the rows with missing values for the categorical feature. For the sake of simplicity, we create a version of the anes
data with the rows that are missing for partyID
removed:
anes_scatter = anes[~anes.partyID.isnull()]
We can now use partyID
to color-code the points:
fig = px.scatter(anes_scatter.head(200), x='ftbiden', y='fttrump',
color = 'partyID',
height=600, width=600,
labels={'ftbiden':'Joe Biden thermometer rating',
'fttrump':'Donald Trump thermometer rating'},
hover_data=['partyID', 'sex', 'state'],
title = 'Trump vs. Biden Feeling Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
If we allow the points to be partially transparent, then darker regions on the scatterplot represent areas with more data. For the static visualizations, the transparency parameter is named “alpha”, but for plotly.express
it is named opacity
. As with “alpha”, when opacity=1
(the default) the points are perfectly solid and when opacity=0
the points are fully transparent, so any value in between 0 and 1 represents degrees of opacity. We can change the color of every point to black, as black shows us most clearly where the highest density of points are. Changing the colors of points is not possible in plotly.express
, however, unless we connect color
to a categorical feature. Our workaround is to create a new feature within the call to px.scatter()
that is “black” on every row with
['black']*anes_scatter.shape[0]
Here anes_scatter.shape
returns the dimensions of anes_scatter
, and anes_scatter.shape[0]
returns the number of rows. Multiplying ['black']
by a number creates a list with as many repetitions of “black” as the number we multiply it by. That sets the category of the new feature to “black” for every row in anes_scatter
. Next we write
color_discrete_map = {'black':'black'}
inside the call to px.scatter()
. This argument maps the categories “black” to the color black. In the following graph, we change all the points to black and we set opacity=.1
to see the highest density regions:
fig = px.scatter(anes_scatter, x='ftbiden', y='fttrump',
opacity = .1,
color=['black']*anes_scatter.shape[0],
color_discrete_map = {'black':'black'},
height=600, width=600,
labels={'ftbiden':'Joe Biden thermometer rating',
'fttrump':'Donald Trump thermometer rating'},
hover_data=['partyID', 'sex', 'state'],
title = 'Trump vs. Biden Feeling Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.update_layout(showlegend=False)
fig.show()
We can add a line-of-best-fit to the scatterplot using ordinary least squares by including trendline='ols'
:
fig = px.scatter(anes_scatter.head(200), x='ftbiden', y='fttrump',
trendline='ols',
height=600, width=600,
labels={'ftbiden':'Joe Biden thermometer rating',
'fttrump':'Donald Trump thermometer rating'},
hover_data=['partyID', 'sex', 'state'],
title = 'Trump vs. Biden Feeling Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
To illustrate a nonlinear fitting curve, we can employ locally weighted scatterplot smoothing (LOWESS) by typing trendline='lowess'
:
fig = px.scatter(anes_scatter.head(200), x='ftbiden', y='fttrump',
trendline='lowess',
height=600, width=600,
labels={'ftbiden':'Joe Biden thermometer rating',
'fttrump':'Donald Trump thermometer rating'},
hover_data=['partyID', 'sex', 'state'],
title = 'Trump vs. Biden Feeling Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
Faceting works in the same way for scatterplots as it works for barplots. In the following graph, we use facet_col='partyID'
to create three scatterplots, subsetting the data by party affiliation, and placing these plots next to each other in one row:
fig = px.scatter(anes_scatter, x='ftbiden', y='fttrump', facet_col='partyID',
hover_data=['partyID', 'sex', 'state'],
opacity = .1, color=['black']*anes_scatter.shape[0],
color_discrete_map = {'black':'black'},
labels={'ftbiden':'Joe Biden thermometer rating',
'fttrump':'Donald Trump thermometer rating'},
title = 'Trump vs. Biden Feeling Thermometer Ratings',
width=1000, height=400)
fig.update(layout=dict(title=dict(x=0.5)))
fig.update_layout(showlegend=False)
fig.for_each_annotation(lambda a: a.update(text=a.text.replace("partyID=", "")))
fig.show()
12.2.3. Lineplots#
To demonstrate a lineplot, we can group the anes
data by age to see the variation in the Biden and Trump thermometers across ages. The following code creates a dataframe with the mean, median, 25th and 75th percentiles, and the interquartile ranges for the Biden thermometer, then creates a second dataframe with the same information extracted from the Trump thermometer, then uses the .append()
method to combine these dataframes one on top of the other:
def q25(x):
return x.quantile(.25)
def q75(x):
return x.quantile(.75)
def iqr(x):
return x.quantile(.75) - x.quantile(.25)
anes_line = anes.query("age <= 85").groupby('age').agg({'ftbiden':['mean','median',q25, q75, iqr]})
anes_line.columns = anes_line.columns.droplevel()
anes_line = anes_line.reset_index()
anes_line['candidate'] = 'Joe Biden'
anes_line2 = anes.query("age <= 85").groupby('age').agg({'fttrump':['mean','median',q25, q75, iqr]})
anes_line2.columns = anes_line2.columns.droplevel()
anes_line2 = anes_line2.reset_index()
anes_line2['candidate'] = 'Donald Trump'
anes_line = anes_line.append(anes_line2)
anes_line
age | mean | median | q25 | q75 | iqr | candidate | |
---|---|---|---|---|---|---|---|
0 | 20 | 43.604651 | 43.0 | 30.00 | 56.00 | 26.0 | Joe Biden |
1 | 21 | 48.709677 | 45.0 | 27.00 | 71.50 | 44.5 | Joe Biden |
2 | 22 | 38.827586 | 38.0 | 16.00 | 52.00 | 36.0 | Joe Biden |
3 | 23 | 40.785714 | 40.5 | 28.75 | 55.75 | 27.0 | Joe Biden |
4 | 24 | 51.640000 | 54.0 | 28.00 | 73.00 | 45.0 | Joe Biden |
... | ... | ... | ... | ... | ... | ... | ... |
61 | 81 | 52.736842 | 64.0 | 4.50 | 98.00 | 93.5 | Donald Trump |
62 | 82 | 55.866667 | 90.0 | 0.50 | 92.00 | 91.5 | Donald Trump |
63 | 83 | 67.230769 | 91.0 | 7.00 | 100.00 | 93.0 | Donald Trump |
64 | 84 | 62.153846 | 88.0 | 20.00 | 100.00 | 80.0 | Donald Trump |
65 | 85 | 72.500000 | 92.0 | 60.75 | 97.75 | 37.0 | Donald Trump |
132 rows × 7 columns
If we have multiple features that we want to plot with lines on the same graph, we must arrange the data in the long-format shown above.
The syntax for px.line()
, which creates a lineplot, follows the same format as px.bar()
and px.scatter()
. We can create a lineplot with age on the x-axis and the average thermometer on the y-axis with x='age', y='mean'
. To include both the Biden and Trump thermometers on the same graph, we set both color
and line_dash
equal to candidate
so that these two lines have both different colors and different line types. We use the same syntax we used above to label the axes, include a title, adjust the height and width of the figure, and include the median, quantiles, and interquartile range as hover data:
fig = px.line(anes_line, x='age', y='mean', color='candidate',
line_dash = 'candidate',
title='Feeling Thermometer Ratings By Age Group',
labels={'age':'Age',
'mean':'Average thermometer rating'},
hover_data=['median', 'q25', 'q75', 'iqr'],
height=600, width=800)
fig.update_layout(yaxis=dict(range=[0,100]))
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
For faceting, we include facet_col='candidate'
in the call to px.line()
:
fig = px.line(anes_line, x='age', y='mean', color='candidate', facet_col='candidate',
hover_data=['median', 'q25', 'q75', 'iqr'],
labels={'age':'Age',
'mean':'Average thermometer rating'},
title = 'Feeling Thermometer Ratings By Age Group',
width=1000, height=400)
fig.update(layout=dict(title=dict(x=0.5)))
fig.update_layout(showlegend=False)
fig.for_each_annotation(lambda a: a.update(text=a.text.replace("candidate=", "")))
fig.show()
12.2.4. Distributions#
In chapter 11 we discussed histograms, density plots, violin plots, and bar plots as ways to understand the entirety of a feature’s distribution as opposed to single descriptive statistics like means. We can generate the same figures with plotly.express
as well. To create a histogram, we can use the px.histogram()
. We indicate the feature we want to visualize by setting it equal to x
for a vertically-oriented histogram or to y
for a horizontally-oriented histogram. We use the same syntax as other plotly.express
functions to label the axes and include a title. Here is a histogram for the Biden thermometer score:
fig = px.histogram(anes, x='ftbiden',
labels={'ftbiden':'Joe Biden thermometer rating'},
title = 'Distribution of Joe Biden Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
One useful default behavior of px.histogram()
is that it tells us the range of each box in the hover data. The first bin collects responses that rate Biden between 0 and 4, for example.
To change the number of bins, we can use the nbins
argument. We can also include a second distributional plot to provide more context to a histogram: if we attach a boxplot to the top of the histogram, we will be able to see the range, median, and 25th and 75th percentiles for the feature. We can include this plot by adding marginal='box'
to the call to px.histogram()
:
fig = px.histogram(anes, x='ftbiden', nbins=60, marginal='box',
labels={'ftbiden':'Joe Biden thermometer rating'},
title = 'Distribution of Joe Biden Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
We can use px.violin()
for a violin plot. If we set color
equal to a categorical feature, px.violin()
will create two plots side-by-side to make comparisons easier between these groups:
fig = px.violin(anes, y='ftbiden', x = 'sex', color = 'sex',
labels={'ftbiden':'Joe Biden thermometer rating', 'sex':''},
title = 'Distribution of Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.update_layout(showlegend=False)
fig.show()
To compare the distributions of different features in the same data, we have to reshape the data to long-format so that the names of these features are contained in a categorical feature. For example, to generate violin plots for Biden, Trump, Obama, and Sanders, we first reshape the data so that these four features are stacked on top of one another as follows:
anes_cand = pd.melt(anes, id_vars = ['caseid'],
value_vars = ['ftbiden', 'fttrump',
'ftobama', 'ftsanders'])
anes_cand = anes_cand.rename({'variable':'candidate',
'value':'thermometer'}, axis=1)
anes_cand['candidate'] = anes_cand['candidate'].map({'ftbiden':'Joe Biden',
'fttrump':'Donald Trump',
'ftobama':'Barack Obama',
'ftsanders':'Bernie Sanders'})
anes_cand
caseid | candidate | thermometer | |
---|---|---|---|
0 | 1 | Joe Biden | 52.0 |
1 | 2 | Joe Biden | 41.0 |
2 | 3 | Joe Biden | 88.0 |
3 | 4 | Joe Biden | 0.0 |
4 | 5 | Joe Biden | 25.0 |
... | ... | ... | ... |
12655 | 3161 | Bernie Sanders | 6.0 |
12656 | 3162 | Bernie Sanders | 92.0 |
12657 | 3163 | Bernie Sanders | 59.0 |
12658 | 3164 | Bernie Sanders | 79.0 |
12659 | 3165 | Bernie Sanders | 100.0 |
12660 rows × 3 columns
Then we can set color='candidate'
to see the violin-representations of the distributions side-by-side:
fig = px.violin(anes_cand, y='thermometer', x = 'candidate', color = 'candidate',
labels={'thermometer':'Feeling thermometer rating', 'candidate':''},
title = 'Distribution of Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
The exact same notation works for barplots if we change the function to px.bar()
. In this case, we switch the x and y features for a horizontally-oriented graph:
fig = px.box(anes_cand, x='thermometer', y = 'candidate', color = 'candidate',
labels={'thermometer':'Feeling thermometer rating', 'candidate':''},
title = 'Distribution of Thermometer Ratings')
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
12.2.5. Interactive Maps#
Maps in in plotly.express
are called choropleth maps. The word “choropleth” refers to the act of shading specific areas on a map to represent differences between these areas or to represent data for these areas on the map. A choropleth map contains two parts: a base layer that is comprised of a particular map, and data that is mapped to shadings that are applied to areas on this map.
There are two ways to use a base layer map in a px.choropleth()
figure. We can supply our own map if the map is coded in a GeoJSON format. GeoJSON files are JSONs in which shapes of geographic areas are supplied with a series of coordinates. GeoJSON files are an important tool for geospatial data analysis, but they are beyond the scope of this discussion.
Alternatively, we can use one of the two maps that are already installed into plotly
and available for us to use with px.choropleth()
. One of these maps is a world map with country borders, and the other is a map of the United States with state borders. As an example of color-coding a world map, we can use the data from the Varieties of Democracy project, which evaluates the quality of democracy for every country in the world. We load the data and keep only the data from the year 2010 (when there are fewer missing values), the country name, the three-letter country ID, and the democracy score:
VDem_url = "https://github.com/jkropko/DS-6001/raw/master/localdata/vdem.csv"
vdem = pd.read_csv(VDem_url)
vdem = vdem.query("year==2010")
vdem = vdem[['country_name', 'country_text_id', 'v2x_polyarchy']]
vdem = vdem.rename({'v2x_polyarchy':'democracy'}, axis=1)
vdem
country_name | country_text_id | democracy | |
---|---|---|---|
50 | Mexico | MEX | 0.670711 |
105 | Suriname | SUR | 0.825037 |
161 | Sweden | SWE | 0.929517 |
216 | Switzerland | CHE | 0.934928 |
271 | Ghana | GHA | 0.785304 |
... | ... | ... | ... |
8340 | Slovakia | SVK | 0.800881 |
8364 | Slovenia | SVN | 0.831588 |
8420 | Solomon Islands | SLB | 0.612996 |
8476 | Vanuatu | VUT | 0.611565 |
8531 | Hungary | HUN | 0.786615 |
168 rows × 3 columns
These three letter codes are ISO-3 codes, which are used by the United Nations and other international organizations as a unique ID for a country, which is necessary because the same country can have different names or different spellings of the same name. In order for px.choropleth()
to match the data to the areas on the world map that correspond to the countries, we must have ISO-3 codes. To create a world map in which the countries are color-coded by democratic quality, we pass the data to px.choropleth()
along with locations
set equal to the feature that contains ISO-3 codes. Then we can set color='democracy'
to color-code by the democracy score, and we can include hover_name='country_name'
to allow the full country names to appear when the user’s mouse hovers over the country on the map. This map is:
fig = px.choropleth(vdem, locations='country_text_id',
color='democracy',
hover_name='country_name',
title='Democracy in the World, 2010',
width=1000, height=800)
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()
To zoom on a choropleth map, we can place the mouse on the map and use the same mouse functions we use for scrolling up and down on a page.
To create a map of U.S. states color-coded by a feature in our data, we must have the official two-letter postal code state abbreviations in the data. We have this information stored in the ANES in the state_abb
feature. The following data groups the ANES by state and generates a count of the observations from each state:
anes_state = anes.groupby(['state_abb', 'state']).size().reset_index()
anes_state = anes_state.rename({0:'count'}, axis=1)
anes_state
state_abb | state | count | |
---|---|---|---|
0 | AK | Alaska | 7 |
1 | AL | Alabama | 58 |
2 | AR | Arkansas | 24 |
3 | AZ | Arizona | 97 |
4 | CA | California | 279 |
5 | CO | Colorado | 50 |
6 | CT | Connecticut | 37 |
7 | DC | District of Columbia | 7 |
8 | DE | Delaware | 11 |
9 | FL | Florida | 261 |
10 | GA | Georgia | 86 |
11 | HI | Hawaii | 6 |
12 | IA | Iowa | 31 |
13 | ID | Idaho | 14 |
14 | IL | Illinois | 113 |
15 | IN | Indiana | 67 |
16 | KS | Kansas | 21 |
17 | KY | Kentucky | 49 |
18 | LA | Louisiana | 39 |
19 | MA | Massachusetts | 51 |
20 | MD | Maryland | 60 |
21 | ME | Maine | 17 |
22 | MI | Michigan | 105 |
23 | MN | Minnesota | 56 |
24 | MO | Missouri | 69 |
25 | MS | Mississippi | 25 |
26 | MT | Montana | 20 |
27 | NC | North Carolina | 100 |
28 | ND | North Dakota | 9 |
29 | NE | Nebraska | 20 |
30 | NH | New Hampshire | 26 |
31 | NJ | New Jersey | 79 |
32 | NM | New Mexico | 24 |
33 | NV | Nevada | 35 |
34 | NY | New York | 185 |
35 | OH | Ohio | 123 |
36 | OK | Oklahoma | 31 |
37 | OR | Oregon | 52 |
38 | PA | Pennsylvania | 163 |
39 | RI | Rhode Island | 8 |
40 | SC | South Carolina | 59 |
41 | SD | South Dakota | 9 |
42 | TN | Tennessee | 76 |
43 | TX | Texas | 210 |
44 | UT | Utah | 27 |
45 | VA | Virginia | 112 |
46 | VT | Vermont | 8 |
47 | WA | Washington | 66 |
48 | WI | Wisconsin | 54 |
49 | WV | West Virginia | 24 |
50 | WY | Wyoming | 5 |
We can use this dataframe to create a map that shows us where the ANES draws its sample from. We pass the data to px.choropleth()
and map the data to the states on the map by specifying the two-letter abbreviations with locations='state_abb'
. By default, px.choropleth()
will use the world map. To use the U.S. state map instead we specify locationmode='USA-states', scope='usa'
. In this case we color-code by the count.
fig = px.choropleth(anes_state, locations='state_abb',
hover_name='state', color='count',
locationmode='USA-states', scope='usa')
fig.show()
More people surveyed by the ANES live in the states with lighter shades, so many people in the sample live in California, Florida, Texas, and New York.
12.3. Creating Dashboards#
To communicate our findings to an audience, we need to be able to share our code and output with the audience. There are many ways to do that. We can write a paper that includes the tables and graphs we want to share and distribute it as a PDF, but a PDF is a difficult medium for sharing code as it can be hard to copy-and-paste the code into a text editor from a PDF. We can write Jupyter notebooks and share the .ipynb file or export it to an HTML file, but while Jupyter is a great initial format for sharing text, code, and output, it offers us little control over the appearance of the notebook: it’s hard to change the font of the text in a notebook, for example, without writing custom CSS code.
Another way to share findings with an audience is to create a webpage. Unlike a Jupyter notebook, there are a myriad of straightforward ways to change the appearance of a webpage to make the code and results easy for users to read and understand. A website that displays interactive tables and visualizations from a data analysis is called a dashboard. According to Stelian Subotin,
Dashboards are a unique and powerful way to present data-based intelligence using data visualization techniques that display relevant, actionable data as well as track stats and key performance indicators … . Dashboards should present this data in a quick, easy-to-scan format with the most relevant information understandable at a glance.
There are many software packages for creating dashboards, some of which are free and open-source, and others of which are proprietary. Here we will focus on using dash
, which is made by plotly
, and works well with plotly
graphics. The topic of dashboard design is an important part of front-end software development, and information on approaches and techniques for this topic can easily fill several books. The following discussion is intended only to be an introduction. This tutorial can help demonstrate some of the more advanced features of dash
, and here is a gallery of some beautiful dashboards that were created with dash
.
12.3.1. Principles of User Experience (UX) Design#
The process of designing a webpage in general, or a dashboard specifically, with the needs of an audience in mind is called user experience (UX) design. UX is a huge subject and it takes a great deal of training and experience to master the skills necessary for effective UX design. But it is a good idea to think about some principles that will guide us as we design dashboards. Subotin cautions against including too much data in one dashboard because “the more information we display, the harder it is for users to find what they need.” Instead, we should think carefully about the goals of the project, the properties of the data, and needs of the users. Based on these considerations, we need to prioritize the data and display only the most relevant and important data on the dashboard. As with static visualizations, dashboards need to be used to tell a story, and that story must be understandable.
It can be difficult to know how to prioritize data, visualizations, and design elements. Subotin recommends beginning by helping a client to set goals that are Specific, Measurable, Actionable, Realistic, and Time-Based (SMART). That is, these goals should set unambiguous objectives that are quantifiable and can be used by the client to make decisions. These goals should be feasible given the quality and size of the data and the time-constraints of the clients. As UX designers, we need to be aware of the client’s goals and design a dashboard that helps the client achieve these goals. That inital thought-process will help us choose the data and visualizations that should appear on the dashboard.
Subotin suggests that the most important data be displayed on the primary display of the dashboard, and that additional and secondary data be accessible through options, menus, or buttons. A mechanism to show more and more data as a user requests it is called progressive disclosure.
Finally, it is important to make every design decision with the user in mind. Among these decisions:
Does the design consider the direction the visitor is used to reading in?
Does interaction with the dashboard require technical knowledge?
Will users manage to accomplish most of the actions in just a few clicks?
Our goal should be to design a dashboard that users will be able to read quickly and intuitively. We need to work to avoid situations in which a user is confused about how to use the dashboard or how to find a particular datapoint or visualization. We, as designers, do not share the same mindset as a new user of the dashboard, and it can be hard to know what will be confusing to users. UX designers often employ user testing to understand how users are interacting with a dashboard. There are many methods for conducting a user test, but one way is to ask a user to speak their thoughts aloud as they use a dashboard or app for the first time as designers listen but do not speak or intervene.
12.3.2. Using dash
#
In this section we will build an entire dashboard using dash
to communicate some of our findings from the ANES data. dash
can display text, tables, static and interactive data visualizations, and can accept user inputs to change tables and graphics on the dashboard. Using dash
can be very challenging, so we will build up to a complete dashboard by starting with a very simple interface and iteratively adding features.
To create interactive tables and visualizations, we load the numpy
, pandas
, and the following modules from plotly
:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
To use dash
, we need the following packages and modules:
import dash
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
Web-based applications like dashboards use a programming language called cascading style sheets (CSS) to set global parameters that control the appearance of the elements of the app, including the layout, colors, and fonts of these elements. Serious front-end developers spend a lot of time programming in CSS to exert fine-tuned control over the appearance and functionality of a web-application. For our purposes, we can take advantage of a feature of dash
that allows us to use an external CSS stylesheet for our own app. We will be using the following stylesheet in our examples:
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
12.3.2.1. Displaying a Dashboard Inside a Jupyter Notebook#
Until recently, dashboards were incompatible with Jupyter notebooks. We would have had to write a Python script - a plain text file with a .py
extension that contains executable Python code - and run it to generate the dashboard. However, a recent Python package called jupyterdash
allows us to display dahboards directly inside a notebook. There are two primary differences between dash
and the JupyterDash()
function: how the code begins, and how the code ends. Everything we use to populate and organize the dashboard is the same no matter whether we use dash
or jupyterdash
.
To create the intial Python variable that will contain the dashboard app using dash
, we type
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
For some reason, the first argument of this function must always __name__
. According to the dash
documentation:
it is important to set the name parameter of the Dash instance to the value
__name__
, so that Dash can correctly detect the location of any static assets inside an assets directory for this Dash app.
The second argument registers the external CSS stylesheet we defined above. To create the dashboard app using jupyterdash
, we write instead
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
To complete the code for the app and to launch it using dash, we will type
if __name__ == '__main__':
app.run_server(debug=True)
If the dash
code executes correctly, then will will see a URL for a locally-stored webpage, such as
Dash app running on http://127.0.0.1:8050/
We can copy-and-paste this address into a browser to see the app. If instead we are using jupyterdash
, we can type
if __name__ == '__main__':
app.run_server(mode='inline', debug=True)
to see the dashboard inside our notebook, or
if __name__ == '__main__':
app.run_server(mode='external', debug=True)
to use a local web-address to see the dashboard in a separate browser window.
12.3.2.2. Collecting the Elements for the Dashboard#
The first step in creating a dashboard is to decide what elements will be displayed on the dashboard. Then we can generate all of these elements first, so that the code to create the dashboard is simpler.
First, I want a title: “Exploring the 2019 American National Election Pilot Study”. Since the title is short, there’s no need to save it as a separate Python variable.
Second, I want some text that explains to the audience what the ANES data is. I wrote the following text in chapter 8 when we first used this dataframe as an example of how to use pandas
:
The American National Election Study (ANES) is a massive public opinion survey conducted after every national election. It is one of the greatest sources of data available about the voting population of the United States. It contains far more information than a typical public opinion poll. Iterations of the survey contain thousands of features from thousands of respondents, and examines people’s attitudes on the election, the candidates, the parties, it collects massive amounts of demographic information and other characteristics from voters, and it records people’s opinions on a myriad of political and social issues.
Prior to each election the ANES conducts a “pilot study” that asks many of the questions that will be asked on the post-election survey. The idea is to capture a snapshot of the American electorate prior to the election and to get a sense of how the survey instrument is working so that adjustments can be made in time. Here we will work with the 2019 ANES pilot data. To understand the features and the values used to code responses, the data have an associated questionnaire and codebook. The pilot data were collected in December 2019 and contain 900 features collected from 3,165 respondents.
A dashboard can parse markdown code, which I used to format the above text. I save the markdown code as a separate variable:
markdown_text = '''
The [American National Election Study](https://electionstudies.org) (ANES) is a massive public opinion survey conducted after every national election. It is one of the greatest sources of data available about the voting population of the United States. It contains far more information than a typical public opinion poll. Iterations of the survey contain thousands of features from thousands of respondents, and examines people's attitudes on the election, the candidates, the parties, it collects massive amounts of demographic information and other characteristics from voters, and it records people's opinions on a myriad of political and social issues.
Prior to each election the ANES conducts a "pilot study" that asks many of the questions that will be asked on the post-election survey. The idea is to capture a snapshot of the American electorate prior to the election and to get a sense of how the survey instrument is working so that adjustments can be made in time. Here we will work with the [2019 ANES pilot data](https://electionstudies.org/data-center/2019-pilot-study/). To understand the features and the values used to code responses, the data have an associated [questionnaire](https://electionstudies.org/wp-content/uploads/2020/02/anes_pilot_2019_questionnaire.pdf) and [codebook](https://electionstudies.org/wp-content/uploads/2020/02/anes_pilot_2019_userguidecodebook.pdf). The pilot data were collected in December 2019 and contain 900 features collected from 3,165 respondents.
'''
Next I want to display a table that lists the number of votes people say they will cast for these candidates, along with the average age of each voter group, and the urban/rural distribution of the groups:
anes_display = anes.groupby('vote').agg({'vote':'size',
'age':'mean'})
anes_display['percent'] = 100*anes_display.vote / sum(anes_display.vote)
anes_display = pd.merge(anes_display, 100*pd.crosstab(anes.vote, anes.liveurban, normalize='index'),
left_index=True, right_index=True)
anes_display = anes_display[['vote', 'percent', 'age',
'City', 'Rural', 'Suburb', 'Town']]
anes_display = anes_display.rename({'vote':'Votes',
'age':'Avg. age',
'percent':'Percent',
'City':'% City',
'Rural':'% Rural',
'Suburb':'% Suburban',
'Town':'% Town'}, axis=1)
anes_display = round(anes_display, 2)
anes_display = anes_display.reset_index().rename({'vote':'Candidate'}, axis=1)
anes_display
Candidate | Votes | Percent | Avg. age | % City | % Rural | % Suburban | % Town | |
---|---|---|---|---|---|---|---|---|
0 | Donald Trump | 1273 | 40.22 | 56.98 | 18.46 | 24.67 | 36.53 | 20.35 |
1 | Joe Biden | 1288 | 40.70 | 51.32 | 31.13 | 15.84 | 35.87 | 17.16 |
2 | Probably will not vote | 283 | 8.94 | 40.32 | 31.10 | 21.55 | 29.33 | 18.02 |
3 | Someone else | 321 | 10.14 | 45.60 | 28.66 | 19.94 | 33.33 | 18.07 |
To format this table in an interactive and web-enabled way, I pass the table to the ff.create_table()
function:
table = ff.create_table(anes_display)
table.show()
Next I want to display a barplot, line plot, violin plot, and a map on the dashboard. We already created these figures using plotly
above. I copy the code and save the images as fig_bar
, fig_line
, fig_vio
, and fig_map
respectively:
fig_bar = px.bar(anes_groupbar, x='partyID', y='rowpercent', color='partyID',
facet_col='vote', facet_col_wrap=2,
hover_data = ['votes', 'Biden thermometer', 'Trump thermometer'],
labels={'partyID':'Party Identification', 'rowpercent':'Percent'},
text='rowtext', width=1000, height=600)
fig_bar.update(layout=dict(title=dict(x=0.5)))
fig_bar.update_layout(showlegend=False)
fig_bar.for_each_annotation(lambda a: a.update(text=a.text.replace("vote=", "")))
fig_bar.show()
fig_line = px.line(anes_line, x='age', y='mean', color='candidate',
line_dash = 'candidate',
labels={'age':'Age',
'mean':'Average thermometer rating'},
hover_data=['median', 'q25', 'q75', 'iqr'],
height=400, width=600)
fig_line.update_layout(yaxis=dict(range=[0,100]))
fig_line.update(layout=dict(title=dict(x=0.5)))
fig_line.show()
fig_vio = px.violin(anes_cand, y='thermometer', x = 'candidate', color = 'candidate',
labels={'thermometer':'Feeling thermometer rating', 'candidate':''},
title = 'Distribution of Thermometer Ratings')
fig_vio.update(layout=dict(title=dict(x=0.5)))
fig_vio.show()
I also want a map in which states are color-coded to be red if more people in the ANES intend to vote for Trump in the state than for Biden, blue if more people intend to vote for Biden than for Trump, and purple if there is a tie. First I generate the data that contains the counts by state of the Trump and Biden votes, and we can denote the result:
anes_state = pd.crosstab(anes.state_abb, anes.vote)
anes_state = anes_state[['Donald Trump', 'Joe Biden']].reset_index()
anes_state['difference'] = anes_state['Donald Trump'] - anes_state['Joe Biden']
anes_state['result'] = pd.cut(anes_state.difference, [-100, -.00001, 0, 100], labels=['biden','tie','trump'])
anes_state = pd.merge(anes_state, anes.groupby(['state', 'state_abb']).size().reset_index(), on='state_abb')
anes_state = anes_state.rename({0:'voters'}, axis=1)
anes_state
state_abb | Donald Trump | Joe Biden | difference | result | state | voters | |
---|---|---|---|---|---|---|---|
0 | AK | 2 | 4 | -2 | biden | Alaska | 7 |
1 | AL | 26 | 24 | 2 | trump | Alabama | 58 |
2 | AR | 11 | 7 | 4 | trump | Arkansas | 24 |
3 | AZ | 56 | 28 | 28 | trump | Arizona | 97 |
4 | CA | 93 | 122 | -29 | biden | California | 279 |
5 | CO | 19 | 20 | -1 | biden | Colorado | 50 |
6 | CT | 15 | 16 | -1 | biden | Connecticut | 37 |
7 | DC | 0 | 6 | -6 | biden | District of Columbia | 7 |
8 | DE | 5 | 5 | 0 | tie | Delaware | 11 |
9 | FL | 118 | 105 | 13 | trump | Florida | 261 |
10 | GA | 40 | 27 | 13 | trump | Georgia | 86 |
11 | HI | 1 | 5 | -4 | biden | Hawaii | 6 |
12 | IA | 12 | 17 | -5 | biden | Iowa | 31 |
13 | ID | 7 | 4 | 3 | trump | Idaho | 14 |
14 | IL | 33 | 50 | -17 | biden | Illinois | 113 |
15 | IN | 27 | 27 | 0 | tie | Indiana | 67 |
16 | KS | 10 | 6 | 4 | trump | Kansas | 21 |
17 | KY | 21 | 13 | 8 | trump | Kentucky | 49 |
18 | LA | 12 | 16 | -4 | biden | Louisiana | 39 |
19 | MA | 10 | 31 | -21 | biden | Massachusetts | 51 |
20 | MD | 19 | 27 | -8 | biden | Maryland | 60 |
21 | ME | 9 | 5 | 4 | trump | Maine | 17 |
22 | MI | 41 | 46 | -5 | biden | Michigan | 105 |
23 | MN | 18 | 26 | -8 | biden | Minnesota | 56 |
24 | MO | 33 | 28 | 5 | trump | Missouri | 69 |
25 | MS | 9 | 11 | -2 | biden | Mississippi | 25 |
26 | MT | 10 | 6 | 4 | trump | Montana | 20 |
27 | NC | 39 | 42 | -3 | biden | North Carolina | 100 |
28 | ND | 5 | 3 | 2 | trump | North Dakota | 9 |
29 | NE | 9 | 6 | 3 | trump | Nebraska | 20 |
30 | NH | 8 | 15 | -7 | biden | New Hampshire | 26 |
31 | NJ | 35 | 28 | 7 | trump | New Jersey | 79 |
32 | NM | 8 | 9 | -1 | biden | New Mexico | 24 |
33 | NV | 14 | 18 | -4 | biden | Nevada | 35 |
34 | NY | 56 | 91 | -35 | biden | New York | 185 |
35 | OH | 48 | 51 | -3 | biden | Ohio | 123 |
36 | OK | 14 | 11 | 3 | trump | Oklahoma | 31 |
37 | OR | 28 | 16 | 12 | trump | Oregon | 52 |
38 | PA | 81 | 55 | 26 | trump | Pennsylvania | 163 |
39 | RI | 2 | 2 | 0 | tie | Rhode Island | 8 |
40 | SC | 26 | 25 | 1 | trump | South Carolina | 59 |
41 | SD | 3 | 2 | 1 | trump | South Dakota | 9 |
42 | TN | 40 | 25 | 15 | trump | Tennessee | 76 |
43 | TX | 91 | 78 | 13 | trump | Texas | 210 |
44 | UT | 12 | 8 | 4 | trump | Utah | 27 |
45 | VA | 48 | 48 | 0 | tie | Virginia | 112 |
46 | VT | 2 | 4 | -2 | biden | Vermont | 8 |
47 | WA | 20 | 32 | -12 | biden | Washington | 66 |
48 | WI | 14 | 27 | -13 | biden | Wisconsin | 54 |
49 | WV | 12 | 6 | 6 | trump | West Virginia | 24 |
50 | WY | 1 | 4 | -3 | biden | Wyoming | 5 |
I pass this dataframe to px.choropleth()
and I use color_discrete_map
to match the results to the colors I want:
fig_map = px.choropleth(anes_state, locations='state_abb',
hover_name='state', hover_data = ['Donald Trump', 'Joe Biden', 'difference', 'voters'],
locationmode='USA-states', color='result', scope="usa",
color_discrete_map = {'biden':'blue',
'tie':'purple',
'trump':'red'})
fig_map.show()
In addition to the text, table, and figures shown above, we can create a scatterplot in which the user can specify the data that goes on the x and y axes, and can choose a categorical feature to color-code the points. To create that scatterplot, we will need the following data:
ft_columns = [col for col in anes if col.startswith('ft')]
cat_columns = ['sex', 'partyID', 'vote', 'ideology']
anes_ft = anes[ft_columns + cat_columns].dropna()
We will use the title, markdown_text
, table
, fig_bar
, fig_line
, fig_vio
, fig_map
, ft_columns
, cat_columns
, and anes_ft
to create the dashboard. We will add these elements in one at a time.
12.3.2.3. Creating a Dashboard With Only a Title and Text#
Let’s start with a dashboard that only contains the title. A dashboard has three parts:
An initial definition of the
app
variable, which contains all of the dashboard code, withapp = JupyterDash(__name__, external_stylesheets=external_stylesheets)
Placing elements onto the dashboard and choosing their positions with
app.layout
(more on this step below).Running the dashboard and displaying it in the notebook with
if __name__ == '__main__': app.run_server(mode='inline', debug=True)
Dashboard code is like a sandwhich. We can keep the beginning and ending of the code fixed, and add more and more elements to the middle. To add elements to the dashboard, the app.layout
attribute must be set equal to html.Div()
, which contains a list of HTML elements. To add a title, we use the html.H1()
function inside this list. H1 is the equivalent of a single # sign in Markdown, and provides title-sized text. The code to create a dashboard with only a title is:
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(
[
html.H1("Exploring the 2019 American National Election Pilot Study")
]
)
if __name__ == '__main__':
app.run_server(mode='inline', debug=True, port=8050)
To add more elements to the dashboard, we can add more elements to the list inside the html.Div()
function.
Next, to add the Markdown text that explains the ANES data, we can use the dcc.Markdown()
function, passing the text we defined above. I am including spaces between the elements of the list inside html.Div()
, not because they are necessary, but because they make the code easier to read. The code is as follows:
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(
[
html.H1("Exploring the 2019 American National Election Pilot Study"),
dcc.Markdown(children = markdown_text)
]
)
if __name__ == '__main__':
app.run_server(mode='inline', debug=True, port=8050)
I added port=8050
to my code because in writing this notebook I ran many dash
apps, and eventually I got an error that said my default port was in use. The default port is 8050, so we can solve this error by changing the port to 8051. But if we are not getting an error like this, there is no need to include the port
argument in the code.
If we want to view the dashboard on an external (local) website, we can change the list line of code as follows:
If we copy-and-paste the address (http://127.0.0.1:8051/ in this case) into a web-browser, we will see our dashboard working outside the context of a Jupyter notebook.
12.3.2.4. Adding Web-Enabled Tables and Figures to the Dashboard#
We used the ff.create_table()
function above to covert a pandas
dataframe into a web-formatted table. We can add this table to the dashboard by using the dcc.Graph()
function in which the figure
attribute is set to the table
variable we created above. We can also create a subtitle for this table, “Comparing Trump and Biden Voters”, using the html.H2()
function. The code for this dashboard is:
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(
[
html.H1("Exploring the 2019 American National Election Pilot Study"),
dcc.Markdown(children = markdown_text),
html.H2("Comparing Trump and Biden Voters"),
dcc.Graph(figure=table)
]
)
if __name__ == '__main__':
app.run_server(mode='inline', debug=True, port=8050)
We can use dcc.Graph()
to include figures, just as we included the table. Here we can include the barplot and the violin plot, both with subtitles:
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(
[
html.H1("Exploring the 2019 American National Election Pilot Study"),
dcc.Markdown(children = markdown_text),
html.H2("Comparing Trump and Biden Voters"),
dcc.Graph(figure=table),
html.H2("Vote Choice By Party"),
dcc.Graph(figure=fig_bar),
html.H2("Distribution of Support for Political Figures"),
dcc.Graph(figure=fig_vio)
]
)
if __name__ == '__main__':
app.run_server(mode='inline', debug=True, port=8050)
12.3.2.5. Adding Figures Side-by-Side#
If we continue to add elements to the dashboard in a single list within html.Div()
, then the elements will appear stacked one on top of the next in the order they are listed. There are times, however, when we want to include elements side-by-side. A horizontal orientation can improve the flow of the dashboard and can provide context to the information by juxtaposing elements.
The next version of the dashboard places the map and th line plot side by side. To do so, we write two new list items that are themselves calls to the html.Div()
function. This function has a style
parameter that controls the position of the HTML elements inside the list. For the first call to html.Div()
we list a subtitle and the map, and set style = {'width':'48%', 'float':'left'}
. 'width':'48%'
sets the size of this frame to not quite half of the screen, and 'float':'left'
aligns this frame on the left-edge of the screen. For the second call to html.Div()
we list a subtitle and the line plot, and set style = {'width':'48%', 'float':'right'}
. Setting both percents slightly less than 50% avoids overlap and adds a comfortable amount of white space between these two elements.
The dashboard that adds these two side-by-side figures is as follows:
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(
[
html.H1("Exploring the 2019 American National Election Pilot Study"),
dcc.Markdown(children = markdown_text),
html.H2("Comparing Trump and Biden Voters"),
dcc.Graph(figure=table),
html.H2("Vote Choice By Party"),
dcc.Graph(figure=fig_bar),
html.H2("Distribution of Support for Political Figures"),
dcc.Graph(figure=fig_vio),
html.Div([
html.H2("Vote Choice By State"),
dcc.Graph(figure=fig_map)
], style = {'width':'48%', 'float':'left'}),
html.Div([
html.H2("Support by Age Group"),
dcc.Graph(figure=fig_line)
], style = {'width':'48%', 'float':'right'})
]
)
if __name__ == '__main__':
app.run_server(mode='inline', debug=True, port=8050)
12.3.2.6. Adding User-Inputs to Alter Dashboard Elements#
plotly
graphics are interactive because the data that a plotly
visualization displays depends on where the user hovers the mouse. But we can add additional interactivity to the dashboard by including drop-down menus, sliders, and other tools that allow a user to specify the exact type of graph and the graph’s aesthetics. In the following example, we will add an element to the dashboard that allows the user to choose which of the feeling thermometer features to place on the x and y axes of a scatterplot, and to optionally choose a categorical feature to use for the colors of the points.
For this interactive scatterplot, I want two elements side-by-side: dropdown menus on the left, taking up about 25% of the screen, and the scatterplot on the right, taking about 70% of the screen (with the remaining 5% white space separating these two elements). We can use the code we used above to place two figures side-by-side:
html.Div([
#the dropdown menus go here
], style={'width': '25%', 'float': 'left'}),
html.Div([
#the scatterplot goes here
], style={'width': '70%', 'float': 'right'})
To create a dropdown menu, we use the dcc.Dropdown()
function. This function has three arguments. First we specify an id
. The id
can be any string we want: the purpose of an id
is to have a name for the menu that we can refer to later when we create the figure. Second, under options
, we specify a list of the options that will appear in the dropdown menu. Third, value
sets a default value for the dropdown menu, if one is needed.
We will create three dropdown menus: one for selecting the x-axis feature, one for selecting the y-axis feature, and one for choosing a feature to color-code the points. The code for the dropdown menus (and subtitles) is:
html.Div([
html.H3("x-axis feature"),
dcc.Dropdown(id='x-axis',
options=[{'label': i, 'value': i} for i in ft_columns],
value='ftbiden'),
html.H3("y-axis feature"),
dcc.Dropdown(id='y-axis',
options=[{'label': i, 'value': i} for i in ft_columns],
value='fttrump'),
html.H3("colors"),
dcc.Dropdown(id='color',
options=[{'label': i, 'value': i} for i in cat_columns])
], style={'width': '25%', 'float': 'left'})
We are naming these elements 'x-axis'
, 'y-axis'
, and 'color'
respectively, and we will connect these menus to the scatterplot by refering to these names. For the x and y-axis menus we use ft_columns
, which is a list we created earlier of all the names of the feeling thermometer columns in anes
from fttrump
to ftimmig
, for the options inside the menu. The code uses a comprehension loop to construct a list of dictionaries that set both the label and the value of each dropdown list item to the column name. If we wanted, we could have created a second list with the original column names under value
and more presentable names (“Joe Biden” instead of ftbiden
) under label
. The color dropdown menu uses cat_columns
, which contains sex
, partyID
, vote
, and ideology
: we can color-code the points according to the categories of any of these features. Finally, the code sets ftbiden
and fttrump
to be the default features listed on the x and y-axes, so that they appears when the dashboard is first loaded. We do not set a default value for color
, which will create a scatterplot without color-coding unless we choose an option under this menu.
To place the scatterplot on our dashboard, we type:
html.Div([
dcc.Graph(id="graph")
], style={'width': '70%', 'float': 'right'})
All we need to do at this point is set the id
of a call to dcc.Graph()
. We will use this id
next, outside of html.Div()
, to create the scatterplot and connect it to the dropdown menus.
To create the scatterplot, we write two blocks of code outside the list of elements within html.Div()
. First we write a callback block that connects the user-supplied values from the dropdown menus to the arguments of a function. Second, we write a function that generates the scatterplot. These two blocks are linked in a way that is not immediately obvious: the inputs for the callback block must be listed in exactly the same order as the arguments in the subsequent function that they refer to.
The callback block for this dashboard is:
@app.callback(Output(component_id="graph",component_property="figure"),
[Input(component_id='x-axis',component_property="value"),
Input(component_id='y-axis',component_property="value"),
Input(component_id='color',component_property="value")])
The @app.callback()
function takes two arguments: an output and a list of inputs. The output Output(component_id="graph",component_property="figure")
takes the output of the function we are about to write and places it in the dashboard where we’ve created an element with the id
string equal to 'graph'
. component_property="figure"
tells the function that this output is a figure. The input is a list with three elements. The first element Input(component_id='x-axis',component_property="value")
sets the first argument of the function we are about to write to the user-specified value of the dropdown menu for the x-axis, which has id='x-axis'
. The next two elements set the second and third arguments in the function we are about to write to the values for the y-axis and the color coding feature to the corresponding dropdown menu values.
Finally, we write a function that generates the scatterplot with the following code:
def make_figure(x, y, color):
return px.scatter(
anes_ft,
x=x,
y=y,
color=color,
trendline='ols',
hover_data=['sex', 'partyID', 'vote', 'ideology'],
height=700,
opacity = .25
)
It doesn’t matter what the function is called, so long as it has three arguments to match with the three inputs we specified in @app.callback()
, and one output. This function has three parameters, x
, y
, and color
, which will be passed to the px.scatterplot()
function. This graph has other parameters as well - the data, trendline, hover data, height, and opacity - but only x
, y
, and color
are allowed to be changed. The return
syntax sets the plotly
scatterplot to be the output of the function. The connections here can be hard to trace, but we start with the dropdown menus in html.Div()
, we pass these menus to the input argument of @app.callback()
, which passes these inputs to the make_figure()
function that we defined. make_function()
generates a scatterplot, which is passed back to @app.callback()
, and sent back to the dcc.Graph()
function inside html.Div()
. The result is a scatterplot that accepts the user-inputs via the dropdown menus, placed on the dashboard the way we specified with the style
arguments.
The complete dashboard is as follows:
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div(
[
html.H1("Exploring the 2019 American National Election Pilot Study"),
dcc.Markdown(children = markdown_text),
html.H2("Comparing Trump and Biden Voters"),
dcc.Graph(figure=table),
html.H2("Vote Choice By Party"),
dcc.Graph(figure=fig_bar),
html.H2("Distribution of Support for Political Figures"),
dcc.Graph(figure=fig_vio),
html.Div([
html.H2("Vote Choice By State"),
dcc.Graph(figure=fig_map)
], style = {'width':'48%', 'float':'left'}),
html.Div([
html.H2("Support by Age Group"),
dcc.Graph(figure=fig_line)
], style = {'width':'48%', 'float':'right'}),
html.H2("Feeling Thermometer Scatterplot"),
html.Div([
html.H3("x-axis feature"),
dcc.Dropdown(id='x-axis',
options=[{'label': i, 'value': i} for i in ft_columns],
value='ftbiden'),
html.H3("y-axis feature"),
dcc.Dropdown(id='y-axis',
options=[{'label': i, 'value': i} for i in ft_columns],
value='fttrump'),
html.H3("colors"),
dcc.Dropdown(id='color',
options=[{'label': i, 'value': i} for i in cat_columns])
], style={'width': '25%', 'float': 'left'}),
html.Div([
dcc.Graph(id="graph")
], style={'width': '70%', 'float': 'right'})
]
)
@app.callback(Output(component_id="graph",component_property="figure"),
[Input(component_id='x-axis',component_property="value"),
Input(component_id='y-axis',component_property="value"),
Input(component_id='color',component_property="value")])
def make_figure(x, y, color):
return px.scatter(
anes_ft,
x=x,
y=y,
color=color,
trendline='ols',
hover_data=['sex', 'partyID', 'vote', 'ideology'],
height=700,
opacity = .25
)
if __name__ == '__main__':
app.run_server(mode='inline', debug=True, port=8050)
12.3.2.7. Deploying the Dashboard Using a Free Hosting Service#
If you want to distribute your dashboard to an audience, with a manager, or with a client, sharing a URL that links to the dashboard is better than sharing a notebook or a Python script file. The best free service for hosting dashboards is called Heroku. It can be challenging to get a dashboard running on Heroku, but once you do, your dashboard will be accessable with a URL of the form yourappname.herokuapp.com. For example, the ANES dashboard we designed above is available at https://anespilot2019.herokuapp.com/.
If you want to deploy an app, the following steps worked for me.
Make sure you have an account on GitHub. If you have an account, sign in. If you don’t, create a new Github account.
Navigate to my GitHub repo for the ANES Heroku app: jkropko/dash-heroku-template
Push the button marked “Fork” in the upper-right corner of the screen. This button creates a copy of the repository under your own GitHub account. This copy belongs to you and you can manipulate it as you see fit. Just make sure you are working with your copy, not mine, by making sure your username appears in the upper-left corner and not “jkropko”.
Collect all of the code needed to run an app, including the package import, data loading, and cleaning steps, and the code to generate the individual elements that populate the dashboard. Start a new Jupyter notebook and paste all of this code into a single cell. If you used
jupyterdash
, change the code to regulardash
by changingapp = dash.Dash(__name__, external_stylesheets=external_stylesheets)
toapp = JupyterDash(__name__, external_stylesheets=external_stylesheets)
andapp.run_server(mode='inline', debug=True)
toapp.run_server(debug=True)
.If your code depends on any local files, upload these files to your new GitHub dash-heroku-template repository by clicking Add File and Upload Files, then pressing Commit. You will then see these files on the main page of the repository. Click on the file you want to use in your code, then click on raw. Copy the URL and paste it into your code wherever you are loading the file. That ensures that all of the code can work 100% online without any need for local storage on your computer.
Run the cell that contains all of your code, and make sure it runs without any errors.
On your copy of the dash-heroku-template GitHub repo, click on “app.py”. You will see Python code for creating a
dash
app. Press the pencil button to edit this file. Copy your code from step 6 and replace the code in this file with your own code.On your dash-heroku-template page, click on “requirements.txt”. If you are using any Python packages that are not already listed here, add them. Set them to be greater than or equal to the version number of the package you are using. (To check on the version number of a package, type
pip show
and the package name.)Go to https://www.heroku.com/home and sign up for a free account.
Once you are signed up and arrive back at the main page, click on the button in the upper-right with three horizontal bars. Click on “Dashboard”. On the Dashboard page, click “New” and “Create new app”.
Choose a name for your app. This name has to be unique from among all of the apps that are hosted on Heroku. Choose a descriptive but short name for the app.
Under “Deployment Method” select GitHub. Type dash-heroku-template in the repo search bar. It should appear below with a button marked “Connect”. Press this button.
Under “Manual Deploy” click on Deploy Branch. Wait a couple minutes for Heroku to parse all of the code on your GitHub repo.
With any luck, you will see a message that reads “Your app was successfully deployed.” Click on View and it will take you to the URL for your app. If you can see your code, congratulations, your app is live and you can share this URL.
If your app encountered an issue, click on More in the upper-right corner of the dashboards screen, and click View Logs. That will take you to the output Heroku provides while attempting to launch your app. If there are any error messages you will see them here and you can try to debug your code. To make changes to your app, edit the “app.py” document on your dash-heroku-template GitHub repo. Once you commit these changes, your Heroku app will relaunch with the new code automatically.