Polishing flow on the Ch3 PR #186

trevorcampbell · 2023-07-27T23:00:00Z

This is an edit on #97 -- since I made quite a few changes, I decided to open a separate PR that will be merged into that PR itself prior to going into main.

trevorcampbell · 2023-07-27T23:01:23Z

source/wrangling.md

 one can use in the `[]` to select subsets of rows.

 +++

+### Extracting columns by name


Reasoning for this edit: This is essentially a repeat of material from Ch1. But we do the same in the R version, and it doesn't hurt to have it here, especially if someone is going to search for how to subset rows/cols and end up in Ch 3 (maybe missing Ch 1 entirely). The section is named "extract rows or columns", so a bit odd not to discuss columns at all.

trevorcampbell · 2023-07-27T23:02:46Z

source/wrangling.md

-Suppose we wanted to select only the columns `language`, `region`,
-`most_at_home` and `most_at_work` from the `tidy_lang` data set. Using what we
-learned in the chapter on {ref}`intro`, we would pass all of these column names into the square brackets.
+In addition to simultaneous subsetting of rows and columns, `loc[]` has two


@joelostblom I think what was missing before was an emphasis on why we would care about loc[]. I've written it here to reference two special abilities of loc beyond [] -- ranges and logical statements to select columns.

I like this additional motivation!

trevorcampbell · 2023-07-27T23:03:26Z

source/wrangling.md


 ```{code-cell} ipython3
 :tags: ["output_scroll"]
-tidy_lang[["language", "region", "most_at_home", "most_at_work"]]


It was weird to have the first example in the "using loc" section be []. Now the first example is a repeat of what we did in Ch 1 (basic usage), followed by "here's what else you can do with loc"

trevorcampbell · 2023-07-27T23:04:46Z

source/wrangling.md


 ```{code-cell} ipython3
 region_lang.groupby("region")["most_at_home"].agg(["min", "max"])
 ```

 The resulting dataframe has `region` as an index name.
-This is similar to what happened when we reshaped data frames in the previous chapter,
+This is similar to what happened when we used the `pivot` function
+in the section on {ref}`pivot-wider`;
 and just as we did then,


it's not in the previous chapter -- it's in the pivot section

trevorcampbell · 2023-07-27T23:05:16Z

source/wrangling.md

-the mean and standard deviation of all of the columns between `"mother_tongue"` and `"lang_known"`.
-We use `[]` to specify the columns and then `agg` to ask for both the `mean` and `std`.
+you can first use `[]` or `.loc[]` to select those columns,
+and then ask for the summary statistic


your example right after used .loc[] so I added that to the text

trevorcampbell · 2023-07-27T23:07:34Z

source/wrangling.md


 ```{code-cell} ipython3
-tidy_lang[:10]


I got rid of this shorthand example entirely. I think it will confuse students with the [] operator earlier. They certainly might (will eventually) see it in the wild, but I think the priority here is to make sure everyone is understanding what we teach and to keep things contained, and not necessarily to cover every possible thing they'll see beyond the class

trevorcampbell · 2023-07-27T23:08:00Z

source/wrangling.md

+
+```{code-cell} ipython3
+:tags: ["output_scroll"]
+tidy_lang.loc[:, :"language"]


I figured including a "before" example would be helpful here too -- this is a bit weird syntax, so being a bit more explicit is useful

trevorcampbell · 2023-07-27T23:08:54Z

@joelostblom this PR edits your other PR -- if you are happy with these edits, we can merge this one and then merge your PR into main

joelostblom

This looks good to me, thanks for making these changes!

joelostblom · 2023-07-29T20:32:47Z

source/wrangling.md

-Suppose we wanted to select only the columns `language`, `region`,
-`most_at_home` and `most_at_work` from the `tidy_lang` data set. Using what we
-learned in the chapter on {ref}`intro`, we would pass all of these column names into the square brackets.
+In addition to simultaneous subsetting of rows and columns, `loc[]` has two


I like this additional motivation!

* Align explanation of loc and iloc with the intro chapter * Explain aggregations more intuitively * Remove loc from groupby section and simplify it * Add mention of value counts for group sizes Prefered over size for me since it has `normalize` * Note that [] cannot be used for ranges and we need loc[] for that * Update startswith with the correct explanation * Fix typo * tc polish on wrangling (#186) --------- Co-authored-by: Trevor Campbell <[email protected]>

tc polish on wrangling

22362c1

trevorcampbell requested a review from joelostblom July 27, 2023 23:00

trevorcampbell commented Jul 27, 2023

View reviewed changes

joelostblom approved these changes Jul 29, 2023

View reviewed changes

joelostblom merged commit 28608ab into ch3-suggestions Jul 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polishing flow on the Ch3 PR #186

Polishing flow on the Ch3 PR #186

trevorcampbell commented Jul 27, 2023

trevorcampbell Jul 27, 2023 •

edited

Loading

trevorcampbell Jul 27, 2023

joelostblom Jul 29, 2023

trevorcampbell Jul 27, 2023

trevorcampbell Jul 27, 2023

trevorcampbell Jul 27, 2023

trevorcampbell Jul 27, 2023

trevorcampbell Jul 27, 2023

trevorcampbell commented Jul 27, 2023

joelostblom left a comment

joelostblom Jul 29, 2023

Polishing flow on the Ch3 PR #186

Polishing flow on the Ch3 PR #186

Conversation

trevorcampbell commented Jul 27, 2023

trevorcampbell Jul 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trevorcampbell commented Jul 27, 2023

joelostblom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trevorcampbell Jul 27, 2023 •

edited

Loading