Skip to content

Fixes to visualizations throughout the book #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jul 14, 2023
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
a296309
Replace ggplot terminology
joelostblom Feb 2, 2023
acbed46
Correct explanation of maxbins
joelostblom Feb 2, 2023
c78b659
Clarify language and use altair terminology
joelostblom Feb 2, 2023
dd6c49d
Make use of parenthesis consistent within the chapter
joelostblom Feb 2, 2023
456473a
Consistently use multiline syntax
joelostblom Feb 2, 2023
57dcac3
Explain alt.X and alt.Y and be consistent in using it only when there…
joelostblom Feb 2, 2023
a4d6edd
Simplify plot by extending the range of the data instead of manipulat…
joelostblom Feb 2, 2023
8a95df1
Clarify the differences between mark_shape and mark_circle and introd…
joelostblom Feb 3, 2023
91e65ab
Clarify explanation of multiline titles
joelostblom Feb 3, 2023
3f50ead
Emphasize operation by putting it first
joelostblom Feb 3, 2023
2ef115f
Explain the formatting of the log plot more carefully and improve the…
joelostblom Feb 14, 2023
77022c6
Clarify which number is the canadian population
joelostblom Feb 14, 2023
fbb9a76
Increase the chart dimensions to fit all the x axis labels
joelostblom Feb 14, 2023
3f54c1a
Remove legend titles and lay labels out vertically when on top
joelostblom Feb 14, 2023
10a970f
Clarify text around color schemes
joelostblom Feb 14, 2023
409cc6d
Show how to add a tooltip and explain the advantages that brings
joelostblom Feb 14, 2023
968d11a
Include note about the danger of using barplots for measures of centr…
joelostblom Feb 15, 2023
a74406b
Clarify explanation of bar plot section
joelostblom Feb 15, 2023
a50ea8d
Remove redundant titles
joelostblom Feb 15, 2023
a0b111e
Properly explain what a histogram is
joelostblom Feb 15, 2023
f1fc0b4
Add example on how to use maxbins
joelostblom Feb 15, 2023
0860de3
Restructure text and modify line appearance to be more readable
joelostblom Feb 15, 2023
0cc933f
Simplify how to facet charts and improve the explanations
joelostblom Feb 15, 2023
b58e077
Move all the maxbins sections into one place
joelostblom Feb 15, 2023
1df06f0
Change the assignment method of a single column the regular syntax in…
joelostblom Feb 15, 2023
59a08c9
Update language to reflect that we are actually looking at relative e…
joelostblom Feb 15, 2023
b6a364f
Name the new column according to the naming scheme of the existing co…
joelostblom Feb 15, 2023
820b47b
Change first maxbins example to also use relative error
joelostblom Feb 15, 2023
c9113ee
Simplify logic of last figure and make it easier to read
joelostblom Feb 15, 2023
d93e6a4
Standardize visualization syntax across chapters
joelostblom Feb 15, 2023
b3e01af
Commit changes to saved svg chart
joelostblom Feb 15, 2023
461371e
Add explicit note regarding code syntax
joelostblom Feb 15, 2023
a6d2768
Explicitly mention re-using chart variable
joelostblom Feb 15, 2023
b06dcaa
Extend faded prediction area to the axes limits in stead of having a …
joelostblom Feb 15, 2023
3f8ac80
Commit changes to saved png chart
joelostblom Feb 15, 2023
3f2805d
viz ref in intro
trevorcampbell Jul 7, 2023
40d1eb6
Fix scale to not squish horizontally
joelostblom Jul 12, 2023
379c54f
Improve language
joelostblom Jul 12, 2023
d992bc7
Add explanation of underlines in numbers
joelostblom Jul 12, 2023
cdbf5b8
Improve language
joelostblom Jul 12, 2023
5ed32f0
Clarify question wording
joelostblom Jul 13, 2023
b531709
Remove bar chart explanation
joelostblom Jul 14, 2023
03b6d36
Fix wording
joelostblom Jul 14, 2023
888c07e
Remove title section
joelostblom Jul 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 29 additions & 31 deletions source/classification1.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,14 +289,10 @@ perimeter and concavity variables. Recall that `altair's` default palette
is colorblind-friendly, so we can stick with that here.

```{code-cell} ipython3
perim_concav = (
alt.Chart(cancer)
.mark_circle()
.encode(
x=alt.X("Perimeter", title="Perimeter (standardized)"),
y=alt.Y("Concavity", title="Concavity (standardized)"),
color=alt.Color("Class", title="Diagnosis"),
)
perim_concav = alt.Chart(cancer).mark_circle().encode(
x=alt.X("Perimeter", title="Perimeter (standardized)"),
y=alt.Y("Concavity", title="Concavity (standardized)"),
color=alt.Color("Class", title="Diagnosis"),
)
perim_concav
```
Expand Down Expand Up @@ -1440,14 +1436,10 @@ rare_cancer = pd.concat((
cancer[cancer["Class"] == 'Malignant'].head(3)
))

rare_plot = (
alt.Chart(rare_cancer)
.mark_circle()
.encode(
x=alt.X("Perimeter", title="Perimeter (standardized)"),
y=alt.Y("Concavity", title="Concavity (standardized)"),
color=alt.Color("Class", title="Diagnosis"),
)
rare_plot = alt.Chart(rare_cancer).mark_circle().encode(
x=alt.X("Perimeter", title="Perimeter (standardized)"),
y=alt.Y("Concavity", title="Concavity (standardized)"),
color=alt.Color("Class", title="Diagnosis"),
)
rare_plot
```
Expand Down Expand Up @@ -1554,10 +1546,10 @@ knn.fit(X=rare_cancer.loc[:, ["Perimeter", "Concavity"]], y=rare_cancer["Class"]

# create a prediction pt grid
per_grid = np.linspace(
rare_cancer["Perimeter"].min(), rare_cancer["Perimeter"].max(), 50
rare_cancer["Perimeter"].min() * 1.05, rare_cancer["Perimeter"].max() * 1.05, 50
)
con_grid = np.linspace(
rare_cancer["Concavity"].min(), rare_cancer["Concavity"].max(), 50
rare_cancer["Concavity"].min() * 1.05, rare_cancer["Concavity"].max() * 1.05, 50
)
pcgrid = np.array(np.meshgrid(per_grid, con_grid)).reshape(2, -1).T
pcgrid = pd.DataFrame(pcgrid, columns=["Perimeter", "Concavity"])
Expand Down Expand Up @@ -1593,14 +1585,16 @@ prediction_plot = (
"Perimeter",
title="Perimeter (standardized)",
scale=alt.Scale(
domain=(rare_cancer["Perimeter"].min(), rare_cancer["Perimeter"].max())
domain=(rare_cancer["Perimeter"].min() * 1.05, rare_cancer["Perimeter"].max() * 1.05),
nice=False
),
),
y=alt.Y(
"Concavity",
title="Concavity (standardized)",
scale=alt.Scale(
domain=(rare_cancer["Concavity"].min(), rare_cancer["Concavity"].max())
domain=(rare_cancer["Concavity"].min() * 1.05, rare_cancer["Concavity"].max() * 1.05),
nice=False
),
),
color=alt.Color("Class", title="Diagnosis"),
Expand Down Expand Up @@ -1684,14 +1678,16 @@ rare_plot = (
"Perimeter",
title="Perimeter (standardized)",
scale=alt.Scale(
domain=(rare_cancer["Perimeter"].min(), rare_cancer["Perimeter"].max())
domain=(rare_cancer["Perimeter"].min() * 1.05, rare_cancer["Perimeter"].max() * 1.05),
nice=False
),
),
y=alt.Y(
"Concavity",
title="Concavity (standardized)",
scale=alt.Scale(
domain=(rare_cancer["Concavity"].min(), rare_cancer["Concavity"].max())
domain=(rare_cancer["Concavity"].min() * 1.05, rare_cancer["Concavity"].max() * 1.05),
nice=False
),
),
color=alt.Color("Class", title="Diagnosis"),
Expand Down Expand Up @@ -1808,10 +1804,10 @@ import numpy as np

# create the grid of area/smoothness vals, and arrange in a data frame
are_grid = np.linspace(
unscaled_cancer["Area"].min(), unscaled_cancer["Area"].max(), 50
unscaled_cancer["Area"].min() * 0.95, unscaled_cancer["Area"].max() * 1.05, 50
)
smo_grid = np.linspace(
unscaled_cancer["Smoothness"].min(), unscaled_cancer["Smoothness"].max(), 50
unscaled_cancer["Smoothness"].min() * 0.95, unscaled_cancer["Smoothness"].max() * 1.05, 50
)
asgrid = np.array(np.meshgrid(are_grid, smo_grid)).reshape(2, -1).T
asgrid = pd.DataFrame(asgrid, columns=["Area", "Smoothness"])
Expand All @@ -1835,17 +1831,19 @@ unscaled_plot = (
"Area",
title="Area",
scale=alt.Scale(
domain=(unscaled_cancer["Area"].min(), unscaled_cancer["Area"].max())
),
domain=(unscaled_cancer["Area"].min() * 0.95, unscaled_cancer["Area"].max() * 1.05),
nice=False
)
),
y=alt.Y(
"Smoothness",
title="Smoothness",
scale=alt.Scale(
domain=(
unscaled_cancer["Smoothness"].min(),
unscaled_cancer["Smoothness"].max(),
)
unscaled_cancer["Smoothness"].min() * 0.95,
unscaled_cancer["Smoothness"].max() * 1.05,
),
nice=False
),
),
color=alt.Color("Class", title="Diagnosis"),
Expand All @@ -1857,8 +1855,8 @@ prediction_plot = (
alt.Chart(prediction_table)
.mark_point(opacity=0.05, filled=True, size=300)
.encode(
x=alt.X("Area"),
y=alt.Y("Smoothness"),
x="Area",
y="Smoothness",
color=alt.Color("Class", title="Diagnosis"),
)
)
Expand Down
83 changes: 44 additions & 39 deletions source/classification2.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,16 +330,11 @@ cancer['Class'] = cancer['Class'].replace({
# create scatter plot of tumor cell concavity versus smoothness,
# labeling the points be diagnosis class

perim_concav = (
alt.Chart(cancer)
.mark_circle()
.encode(
x="Smoothness",
y="Concavity",
color=alt.Color("Class", title="Diagnosis"),
)
perim_concav = alt.Chart(cancer).mark_circle().encode(
x="Smoothness",
y="Concavity",
color=alt.Color("Class", title="Diagnosis"),
)

perim_concav
```

Expand Down Expand Up @@ -1081,19 +1076,15 @@ as shown in {numref}`fig:06-find-k`.
```{code-cell} ipython3
:tags: [remove-output]

accuracy_vs_k = (
alt.Chart(accuracies_grid)
.mark_line(point=True)
.encode(
x=alt.X(
"n_neighbors",
title="Neighbors",
),
y=alt.Y(
"mean_test_score",
title="Accuracy estimate",
scale=alt.Scale(domain=(0.85, 0.90)),
),
accuracy_vs_k = alt.Chart(accuracies_grid).mark_line(point=True).encode(
x=alt.X(
"n_neighbors",
title="Neighbors",
),
y=alt.Y(
"mean_test_score",
title="Accuracy estimate",
scale=alt.Scale(domain=(0.85, 0.90)),
)
)

Expand Down Expand Up @@ -1170,19 +1161,15 @@ large_accuracies_grid = pd.DataFrame(
).cv_results_
)

large_accuracy_vs_k = (
alt.Chart(large_accuracies_grid)
.mark_line(point=True)
.encode(
x=alt.X(
"param_kneighborsclassifier__n_neighbors",
title="Neighbors",
),
y=alt.Y(
"mean_test_score",
title="Accuracy estimate",
scale=alt.Scale(domain=(0.60, 0.90)),
),
large_accuracy_vs_k = alt.Chart(large_accuracies_grid).mark_line(point=True).encode(
x=alt.X(
"param_kneighborsclassifier__n_neighbors",
title="Neighbors",
),
y=alt.Y(
"mean_test_score",
title="Accuracy estimate",
scale=alt.Scale(domain=(0.60, 0.90)),
)
)

Expand Down Expand Up @@ -1269,10 +1256,10 @@ y = cancer_train["Class"]

# create a prediction pt grid
smo_grid = np.linspace(
cancer_train["Smoothness"].min(), cancer_train["Smoothness"].max(), 100
cancer_train["Smoothness"].min() * 0.95, cancer_train["Smoothness"].max() * 1.05, 100
)
con_grid = np.linspace(
cancer_train["Concavity"].min(), cancer_train["Concavity"].max(), 100
cancer_train["Concavity"].min() - 0.025, cancer_train["Concavity"].max() * 1.05, 100
)
scgrid = np.array(np.meshgrid(smo_grid, con_grid)).reshape(2, -1).T
scgrid = pd.DataFrame(scgrid, columns=["Smoothness", "Concavity"])
Expand All @@ -1294,8 +1281,26 @@ for k in [1, 7, 20, 300]:
)
.mark_point(opacity=0.2, filled=True, size=20)
.encode(
x=alt.X("Smoothness"),
y=alt.Y("Concavity"),
x=alt.X(
"Smoothness",
scale=alt.Scale(
domain=(
cancer_train["Smoothness"].min() * 0.95,
cancer_train["Smoothness"].max() * 1.05
),
nice=False
)
),
y=alt.Y(
"Concavity",
scale=alt.Scale(
domain=(
cancer_train["Concavity"].min() -0.025,
cancer_train["Concavity"].max() * 1.05
),
nice=False
)
),
color=alt.Color("Class", title="Diagnosis"),
)
)
Expand Down
Binary file modified source/img/faithful_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion source/img/faithful_plot.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 26 additions & 38 deletions source/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -804,13 +804,12 @@ import altair as alt

+++

The fundamental object in `altair` is the `Chart`, which takes a data frame as a single argument: `alt.Chart(ten_lang)`.
The fundamental object in `altair` is the `Chart`, which takes a data frame as an argument: `alt.Chart(ten_lang)`.
With a chart object in hand, we can now specify how we would like the data to be visualized.
We first indicate what kind of geometric mark we want to use to represent the data. Here we set the mark attribute
We first indicate what kind of graphical *mark* we want to use to represent the data. Here we set the mark attribute
of the chart object using the `Chart.mark_bar` function, because we want to create a bar chart.
Next, we need to encode the variables of the data frame using
the `x` (represents the x-axis position of the points) and
`y` (represents the y-axis position of the points) *channels*. We use the `encode()`
Next, we need to *encode* the variables of the data frame using
the `x` and `y` *channels* (which represent the x-axis and y-axis position of the points). We use the `encode()`
function to handle this: we specify that the `language` column should correspond to the x-axis,
and that the `mother_tongue` column should correspond to the y-axis.

Expand Down Expand Up @@ -853,7 +852,7 @@ Bar plot of the ten Aboriginal languages most often reported by Canadian residen
```{index} see: .; chaining methods
```

### Formatting `altair` objects
### Formatting `altair` charts

It is exciting that we can already visualize our data to help answer our
question, but we are not done yet! We can (and should) do more to improve the
Expand All @@ -865,28 +864,27 @@ example above, Python uses the column name `mother_tongue` as the label for the
y axis, but most people will not know what that is. And even if they did, they
will not know how we measured this variable, or the group of people on which the
measurements were taken. An axis label that reads "Mother Tongue (Number of
Canadian Residents)" would be much more informative.
Canadian Residents)" would be much more informative. To make the code easier to
read, we're spreading it out over multiple lines just as we did in the previous
section with pandas.

```{index} plot; labels, plot; axis labels
```

Adding additional labels to our visualizations that we create in `altair` is
one common and easy way to improve and refine our data visualizations. We can add titles for the axes
in the `altair` objects using `alt.X` and `alt.Y` with the `title` argument to make
the axes titles more informative.
the axes titles more informative (you will learn more about `alt.X` and `alt.Y` in the visualization chapter).
Again, since we are specifying
words (e.g. `"Mother Tongue (Number of Canadian Residents)"`) as arguments to
`alt.X` and `alt.Y`, we surround them with double quotation marks. We can do many other modifications
to format the plot further, and we will explore these in the {ref}`viz` chapter.

```{code-cell} ipython3
barplot_mother_tongue = (
alt.Chart(ten_lang)
.mark_bar().encode(
x=alt.X('language', title='Language'),
y=alt.Y('mother_tongue', title='Mother Tongue (Number of Canadian Residents)')
))

barplot_mother_tongue = alt.Chart(ten_lang).mark_bar().encode(
x=alt.X('language', title='Language'),
y=alt.Y('mother_tongue', title='Mother Tongue (Number of Canadian Residents)')
)
```


Expand Down Expand Up @@ -915,13 +913,10 @@ To accomplish this, we will swap the x and y coordinate axes:


```{code-cell} ipython3
barplot_mother_tongue_axis = (
alt.Chart(ten_lang)
.mark_bar().encode(
x=alt.X('mother_tongue', title='Mother Tongue (Number of Canadian Residents)'),
y=alt.Y('language', title='Language')
))

barplot_mother_tongue_axis = alt.Chart(ten_lang).mark_bar().encode(
x=alt.X('mother_tongue', title='Mother Tongue (Number of Canadian Residents)'),
y=alt.Y('language', title='Language')
)
```

```{code-cell} ipython3
Expand Down Expand Up @@ -951,13 +946,10 @@ the `sort` argument, which orders a variable (here `language`) based on the
values of the variable(`mother_tongue`) on the `x-axis`.

```{code-cell} ipython3
ordered_barplot_mother_tongue = (
alt.Chart(ten_lang)
.mark_bar().encode(
x=alt.X('mother_tongue', title='Mother Tongue (Number of Canadian Residents)'),
y=alt.Y('language', sort='x', title='Language')
))

ordered_barplot_mother_tongue = alt.Chart(ten_lang).mark_bar().encode(
x=alt.X('mother_tongue', title='Mother Tongue (Number of Canadian Residents)'),
y=alt.Y('language', sort='x', title='Language')
)
```

+++
Expand Down Expand Up @@ -1028,17 +1020,13 @@ ten_lang = (
can_lang.loc[can_lang["category"] == "Aboriginal languages", ["language", "mother_tongue"]]
.sort_values(by="mother_tongue", ascending=False)
.head(10)
)
)

# create the visualization
ten_lang_plot = (
alt.Chart(ten_lang)
.mark_bar().encode(
x=alt.X('mother_tongue', title='Mother Tongue (Number of Canadian Residents)'),
y=alt.Y('language', sort='x', title='Language')
))


ten_lang_plot = alt.Chart(ten_lang).mark_bar().encode(
x=alt.X('mother_tongue', title='Mother Tongue (Number of Canadian Residents)'),
y=alt.Y('language', sort='x', title='Language')
)
```

```{code-cell} ipython3
Expand Down
Loading