Skip to content

Fixes to visualizations throughout the book #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jul 14, 2023
Merged

Fixes to visualizations throughout the book #160

merged 44 commits into from
Jul 14, 2023

Conversation

joelostblom
Copy link
Contributor

@joelostblom joelostblom commented Feb 15, 2023

@lheagy @trevorcampbell I had some free time and started fixing some of the open viz issues we have and noticed a few things along the way, so I did a full review on the viz sections of the entire book. Although this PR includes quite a lot of changes, most are clarifying in nature, either making the explanations more intuitive or slightly changing how we create a viz. I also removed any leftovers from the ggplot translation to make it easier for students when they see altair code online such as using the documentation. Lastly I standardized the syntax in all chapters to what was used in the viz chapter so that we don't mix different types of syntax throughout the book. The commit messages include more details about the changes.

I had a quick look at the worksheets and I think the only thing that would need updating there is the scaffold code that students use to match with the syntax in the viz chapter. I can have a more throughout review of the worksheets after you have reviewed this PR, but no hurry.

close #57 close #128

… final version by addressing the missing tick label
The old behavior seems to be a replica of the R plot which was due to how did ggplot handles legends
Previously it was just silently introduced it seems
…staed of using assign and overwriting the entire df
…lumns

All columns in the dataframe should follow the same naming scheme
I went with what was used in most places, including the visualization chapter. This is also the style in the altiar documentation. We can discuss if we want to change this but for now it makes sense to at least have the same syntax everywhere instead of a mix which could be confusing.
@trevorcampbell
Copy link
Contributor

Great stuff @joelostblom -- I imagine this is one for the end of semester (since the course is in full swing at this point)

@joelostblom
Copy link
Contributor Author

Yeah probably, at least not merging it to production before the semester is over for sure.

@joelostblom joelostblom mentioned this pull request May 16, 2023
Copy link
Contributor

@trevorcampbell trevorcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing improvement @joelostblom ! I have lots of fiddly comments but overall it's a great update.

source/viz.md Outdated
we use the `color` argument to color the bars according to the `landmass_type`
The plot in {numref}`islands_bar_top` is clearer now,
and allows us to answer our question:
"Which are the top 7 largest landmasses continents?".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't the question -- the question should be

"Are the top 7 largest landmasses continents?"

which is why you need "in the affirmative" after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is what we have in the question further up:

image

Which to me is more similar to "Which of the top 7 largest landmasses are continents?" (fixing a couple of typos)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the wording the same throughout if possible -- so "are the continents the largest 7" or the like

(also if the original Q were just "which of the top 7 are continents" I'd just plot 7 instead of whatever we're doing now, 12 or something like that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it makes it more complicated that our original question is two-fold: both "Are the continents (North / South America, Africa, Europe, Asia, Australia, Antarctica) Earth’s seven largest landmasses?" and then "If so, what are the next few largest landmasses after those?". I updated to include both of those with the same phrasing as in the original question to keep it consistent, let me know what you think.

I believe the reason we have 12 instead of 7 is to show of the color encoding which would just contain one color if we only included the top 7 since those are all continents:

image

source/viz.md Outdated
we will add a title to the chart by specifying `title` argument inside `alt.Chart`.
A good plot title should usually contain the take home message that you want readers of the chart to focus on,
e.g. "The Earth's seven largest landmasses are all continents",
but it could sometimes be more general, e.g. "The twelve largest landmasses on Earth".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: this stuff about titles seems to have been removed from the R version. I put a pretty high value on syncing the two books as much as possible to avoid divergent material, so we should decide whether this is included or removed or de-synced from R

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(though obviously there will be differences due to differing syntax / code patterns etc, but the set of "ideas/topics we teach" should flow the same)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be valuable to keep them in sync and would prefer to have this section added to the R book. I think it is important to mention titles and show examples of what an appropriate title looks like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelostblom it took me a bit to dig around in my memory for why we decided against this in R, but I finally remembered: we don't use titles anywhere else in the book. If we're suggesting Ch4 is the guide to making good viz, we should follow our own rules elsewhere (and we do, generally). But we felt strongly that we shouldn't have titles everywhere else in the book -- at a minimum, the publisher wouldn't want them in the print book format. So since we weren't including titles everywhere, we didn't feel it would be good to suggest it in Ch4.

I was originally more on the fence about this, but now I'm a solid "no titles" to avoid causing a huge transitive-closure headache elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the book we do have captions everywhere, which fill a similar need as the titles we are explaining to students and help clarify what is figure contains. I think this is important and we don't want student to produce charts without any explanation (whether it is a title and caption), but removing for now so that we can move forward. Follow up in #179

@trevorcampbell
Copy link
Contributor

trevorcampbell commented Jul 7, 2023

I can have a more throughout review of the worksheets after you have reviewed this PR, but no hurry.

Oh also yeah it would be awesome if you could open a PR that syncs the worksheets/tutorials with your new material too :)

(or at least open an issue that specifies what would need to change)

Copy link
Contributor Author

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review @trevorcampbell ! I think I have addressed all your comments or opened issues to follow up so please have a look.

source/viz.md Outdated
we use the `color` argument to color the bars according to the `landmass_type`
The plot in {numref}`islands_bar_top` is clearer now,
and allows us to answer our question:
"Which are the top 7 largest landmasses continents?".
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is what we have in the question further up:

image

Which to me is more similar to "Which of the top 7 largest landmasses are continents?" (fixing a couple of typos)

source/viz.md Outdated
we will add a title to the chart by specifying `title` argument inside `alt.Chart`.
A good plot title should usually contain the take home message that you want readers of the chart to focus on,
e.g. "The Earth's seven largest landmasses are all continents",
but it could sometimes be more general, e.g. "The twelve largest landmasses on Earth".
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be valuable to keep them in sync and would prefer to have this section added to the R book. I think it is important to mention titles and show examples of what an appropriate title looks like.

@trevorcampbell
Copy link
Contributor

@joelostblom thanks for the fixes. I just have that one remaining comment about the wording of the Q. I think we should keep it as consistent as possible throughout the chapter. Don't see a need to change it from what's in the R version and in this one earlier in the chapter

Aside from that, looks good to me!

source/viz.md Outdated
The data for this is stored in the `landmass_type` column.
To use this to color the bars,
we use the `color` argument to color the bars according to the `landmass_type`
TO use this to color the bars,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelostblom typo added here "TO" should be "To"

source/viz.md Outdated
To use this to color the bars,
we use the `color` argument to color the bars according to the `landmass_type`
TO use this to color the bars,
we set `color` encoding to `landmass_type`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelostblom typo "set the"

@joelostblom
Copy link
Contributor Author

@trevorcampbell Thanks for the comments, I think I have addressed everything now!

Copy link
Contributor

@trevorcampbell trevorcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's excellent! Thanks so much for the effort @joelostblom -- a nice improvement here.

@joelostblom joelostblom merged commit a182ad5 into main Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify plots Better explanation of alt.X and alt.Y
2 participants