Skip to content

Commit 7d6bdb3

Browse files
Merge pull request #285 from UBC-DSCI/bar-mean
Caveats of bar plots & plot titles & captions
2 parents ccd6575 + d14311a commit 7d6bdb3

File tree

1 file changed

+36
-24
lines changed

1 file changed

+36
-24
lines changed

source/viz.md

+36-24
Original file line numberDiff line numberDiff line change
@@ -1047,7 +1047,7 @@ can_lang_plot_theme = alt.Chart(can_lang).mark_point(filled=True).encode(
10471047
y=alt.Y("mother_tongue_percent")
10481048
.scale(type="log")
10491049
.axis(tickCount=7)
1050-
.title("Mother tongue(percentage of Canadian residents)"),
1050+
.title("Mother tongue (percentage of Canadian residents)"),
10511051
color=alt.Color("category")
10521052
.legend(orient="top")
10531053
.title("")
@@ -1066,7 +1066,7 @@ glue("can_lang_plot_theme", can_lang_plot_theme.properties(height=320, width=420
10661066
:figwidth: 700px
10671067
:name: can_lang_plot_theme
10681068

1069-
Scatter plot of percentage of Canadians reporting a language as their mother tongue vs the primary language at home colored by language category with custom colors.
1069+
Scatter plot of percentage of Canadians reporting a language as their mother tongue vs the primary language at home colored by language category with custom colors and shapes.
10701070
:::
10711071

10721072
The chart above gives a good indication of how the different language categories differ,
@@ -1089,7 +1089,7 @@ can_lang_plot_tooltip = alt.Chart(can_lang).mark_point(filled=True).encode(
10891089
y=alt.Y("mother_tongue_percent")
10901090
.scale(type="log")
10911091
.axis(tickCount=7)
1092-
.title("Mother tongue(percentage of Canadian residents)"),
1092+
.title("Mother tongue (percentage of Canadian residents)"),
10931093
color=alt.Color("category")
10941094
.legend(orient="top")
10951095
.title("")
@@ -1112,7 +1112,7 @@ else:
11121112
:figwidth: 700px
11131113
:name: can_lang_plot_tooltip
11141114

1115-
Scatter plot of percentage of Canadians reporting a language as their mother tongue vs the primary language at home colored by language category with custom colors and mouse hover tooltip.
1115+
Scatter plot of percentage of Canadians reporting a language as their mother tongue vs the primary language at home colored by language category with custom colors and mouse hover tooltip.
11161116
:::
11171117

11181118
From the visualization in {numref}`can_lang_plot_tooltip`,
@@ -1163,10 +1163,13 @@ islands_df
11631163
Here, we have a data frame of Earth's landmasses,
11641164
and are trying to compare their sizes.
11651165
The right type of visualization to answer this question is a bar plot.
1166-
In a bar plot, the height of the bar represents the value of a summary statistic
1167-
(usually a size, count, sum, proportion, or percentage).
1168-
They are particularly useful for comparing summary statistics between different
1169-
groups of a categorical variable.
1166+
In a bar plot, the height of each bar represents the value of an *amount*
1167+
(a size, count, proportion, percentage, etc).
1168+
They are particularly useful for comparing counts or proportions across different
1169+
groups of a categorical variable. Note, however, that bar plots should generally not be
1170+
used to display mean or median values, as they hide important information about
1171+
the variation of the data. Instead it's better to show the distribution of
1172+
all the individual data points, e.g., using a histogram, which we will discuss further in {numref}`histogramsviz`.
11701173

11711174
```{index} altair; mark_bar
11721175
```
@@ -1191,7 +1194,7 @@ glue("islands_bar", islands_bar, display=False)
11911194
:figwidth: 400px
11921195
:name: islands_bar
11931196

1194-
Bar plot of all Earth's landmasses' size with squished labels.
1197+
Bar plot of Earth's landmass sizes. The plot is too wide with the default settings.
11951198
:::
11961199

11971200
Alright, not bad! The plot in {numref}`islands_bar` is
@@ -1209,7 +1212,7 @@ so that the labels are on the y-axis and we don't have to tilt our head to read
12091212
```{note}
12101213
Recall that in {numref}`Chapter %s <intro>`, we used `sort_values` followed by `head` to obtain
12111214
the ten rows with the largest values of a variable. We could have instead used the `nlargest` function
1212-
from `pandas` for this purpose. The `nsmallest` and `nlargest` functions achieve the same goal
1215+
from `pandas` for this purpose. The `nsmallest` and `nlargest` functions achieve the same goal
12131216
as `sort_values` followed by `head`, but are slightly more efficient because they are specialized for this purpose.
12141217
In general, it is good to use more specialized functions when they are available!
12151218
```
@@ -1244,12 +1247,10 @@ and allows us to answer our initial questions:
12441247
"Are the seven continents Earth's largest landmasses?"
12451248
and "Which are the next few largest landmasses?".
12461249
However, we could still improve this visualization
1247-
by organizing the bars by landmass size rather than by alphabetical order
1248-
and by coloring the bars based on whether they correspond to a continent.
1249-
The data for this is stored in the `landmass_type` column.
1250-
To use this to color the bars,
1250+
by coloring the bars based on whether they correspond to a continent, and
1251+
by organizing the bars by landmass size rather than by alphabetical order.
1252+
The data for coloring the bars is stored in the `landmass_type` column, so
12511253
we set the `color` encoding to `landmass_type`.
1252-
12531254
To organize the landmasses by their `size` variable,
12541255
we will use the altair `sort` function
12551256
in the y-encoding of the chart.
@@ -1259,18 +1260,28 @@ This plots the values on `y` axis
12591260
in the ascending order of `x` axis values.
12601261
This creates a chart where the largest bar is the closest to the axis line,
12611262
which is generally the most visually appealing when sorting bars.
1262-
If instead
1263-
we want to sort the values on `y-axis` in descending order of `x-axis`,
1264-
we can add a minus sign to reverse the order and specify `sort="-x"`.
1263+
If instead we wanted to sort the values on `y-axis` in descending order of `x-axis`,
1264+
we could add a minus sign to reverse the order and specify `sort="-x"`.
12651265

12661266
```{index} altair; sort
12671267
```
12681268

1269+
To finalize this plot we will customize the axis and legend labels using the `title` method,
1270+
and add a title to the chart by specifying the `title` argument of `alt.Chart`.
1271+
Plot titles are not always required, especially when it would be redundant with an already-existing
1272+
caption or surrounding context (e.g., in a slide presentation with annotations).
1273+
But if you decide to include one, a good plot title should provide the take home message
1274+
that you want readers to focus on, e.g., "Earth's seven largest landmasses are continents,"
1275+
or a more general summary of the information displayed, e.g., "Earth's twelve largest landmasses."
1276+
12691277
```{code-cell} ipython3
1270-
islands_plot_sorted = alt.Chart(islands_top12).mark_bar().encode(
1271-
x="size",
1272-
y=alt.Y("landmass").sort("x"),
1273-
color=alt.Color("landmass_type")
1278+
islands_plot_sorted = alt.Chart(
1279+
islands_top12,
1280+
title="Earth's seven largest landmasses are continents"
1281+
).mark_bar().encode(
1282+
x=alt.X("size").title("Size (1000 square mi)"),
1283+
y=alt.Y("landmass").sort("x").title("Landmass"),
1284+
color=alt.Color("landmass_type").title("Type")
12741285
)
12751286
```
12761287

@@ -1283,7 +1294,7 @@ glue("islands_plot_sorted", islands_plot_sorted, display=True)
12831294
:figwidth: 700px
12841295
:name: islands_plot_sorted
12851296

1286-
Bar plot of size for Earth's largest 12 landmasses colored by whether its a continent with clearer axes and labels.
1297+
Bar plot of size for Earth's largest 12 landmasses, colored by landmass type, with clearer axes and labels.
12871298
:::
12881299

12891300

@@ -1292,6 +1303,7 @@ visualization for answering our original questions. Landmasses are organized by
12921303
their size, and continents are colored differently than other landmasses,
12931304
making it quite clear that all the seven largest landmasses are continents.
12941305

1306+
(histogramsviz)=
12951307
### Histograms: the Michelson speed of light data set
12961308

12971309
```{index} Michelson speed of light
@@ -1348,7 +1360,7 @@ Note that this time,
13481360
we are setting the `y` encoding to `"count()"`.
13491361
There is no `"count()"` column-name in `morley_df`;
13501362
we use `"count()"` to tell `altair`
1351-
that we want to count the number of occurrences of each value in along the x-axis
1363+
that we want to count the number of occurrences of each value in along the x-axis
13521364
(which we encoded as the `Speed` column).
13531365

13541366
```{code-cell} ipython3

0 commit comments

Comments
 (0)