Skip to content

Polishing flow on the Ch3 PR #186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 29, 2023
Merged

Polishing flow on the Ch3 PR #186

merged 1 commit into from
Jul 29, 2023

Conversation

trevorcampbell
Copy link
Contributor

This is an edit on #97 -- since I made quite a few changes, I decided to open a separate PR that will be merged into that PR itself prior to going into main.

one can use in the `[]` to select subsets of rows.

+++

### Extracting columns by name
Copy link
Contributor Author

@trevorcampbell trevorcampbell Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasoning for this edit: This is essentially a repeat of material from Ch1. But we do the same in the R version, and it doesn't hurt to have it here, especially if someone is going to search for how to subset rows/cols and end up in Ch 3 (maybe missing Ch 1 entirely). The section is named "extract rows or columns", so a bit odd not to discuss columns at all.

Suppose we wanted to select only the columns `language`, `region`,
`most_at_home` and `most_at_work` from the `tidy_lang` data set. Using what we
learned in the chapter on {ref}`intro`, we would pass all of these column names into the square brackets.
In addition to simultaneous subsetting of rows and columns, `loc[]` has two
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelostblom I think what was missing before was an emphasis on why we would care about loc[]. I've written it here to reference two special abilities of loc beyond [] -- ranges and logical statements to select columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this additional motivation!


```{code-cell} ipython3
:tags: ["output_scroll"]
tidy_lang[["language", "region", "most_at_home", "most_at_work"]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was weird to have the first example in the "using loc" section be []. Now the first example is a repeat of what we did in Ch 1 (basic usage), followed by "here's what else you can do with loc"


```{code-cell} ipython3
region_lang.groupby("region")["most_at_home"].agg(["min", "max"])
```

The resulting dataframe has `region` as an index name.
This is similar to what happened when we reshaped data frames in the previous chapter,
This is similar to what happened when we used the `pivot` function
in the section on {ref}`pivot-wider`;
and just as we did then,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not in the previous chapter -- it's in the pivot section

the mean and standard deviation of all of the columns between `"mother_tongue"` and `"lang_known"`.
We use `[]` to specify the columns and then `agg` to ask for both the `mean` and `std`.
you can first use `[]` or `.loc[]` to select those columns,
and then ask for the summary statistic
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your example right after used .loc[] so I added that to the text


```{code-cell} ipython3
tidy_lang[:10]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of this shorthand example entirely. I think it will confuse students with the [] operator earlier. They certainly might (will eventually) see it in the wild, but I think the priority here is to make sure everyone is understanding what we teach and to keep things contained, and not necessarily to cover every possible thing they'll see beyond the class


```{code-cell} ipython3
:tags: ["output_scroll"]
tidy_lang.loc[:, :"language"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured including a "before" example would be helpful here too -- this is a bit weird syntax, so being a bit more explicit is useful

@trevorcampbell
Copy link
Contributor Author

@joelostblom this PR edits your other PR -- if you are happy with these edits, we can merge this one and then merge your PR into main

Copy link
Contributor

@joelostblom joelostblom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for making these changes!

Suppose we wanted to select only the columns `language`, `region`,
`most_at_home` and `most_at_work` from the `tidy_lang` data set. Using what we
learned in the chapter on {ref}`intro`, we would pass all of these column names into the square brackets.
In addition to simultaneous subsetting of rows and columns, `loc[]` has two
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this additional motivation!

@joelostblom joelostblom merged commit 28608ab into ch3-suggestions Jul 29, 2023
joelostblom added a commit that referenced this pull request Jul 29, 2023
* Align explanation of loc and iloc with the intro chapter

* Explain aggregations more intuitively

* Remove loc from groupby section and simplify it

* Add mention of value counts for group sizes

Prefered over size for me since it has `normalize`

* Note that [] cannot be used for ranges and we need loc[] for that

* Update startswith with the correct explanation

* Fix typo

* tc polish on wrangling (#186)

---------

Co-authored-by: Trevor Campbell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants