-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC : Updated the comparison with spreadsheets with GroupBy #39965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0xpranjal
commented
Feb 22, 2021
•
edited
Loading
edited
- part of DOC: add comparison to spreadsheets #38990
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
Outdated
Show resolved
Hide resolved
doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI that I'm mentoring @Bhard27 through HackIllinois (#39349), so am being more nitpicky than I would for most pull request reviews.
Nice work! I believe the build failure is a flaky test unrelated to this change, so can be ignored.
Unfortunately, Power Query doesn't seem to be compatible with Office 365 or Office for Mac, which are the versions I have, so I can't try it.
We do have a section on Pivot Tables below, which is a different way of grouping by row. Let's:
- Create sub-headings under
GroupBy
forPower Query
andPivot Tables
, moving the content for the latter up - Explicitly state that Excel defines "pivot" more broadly than pandas
Thanks!
To aggregate a column, select the column to perform the Aggregate Operation on from the Column drop-down. | ||
A Row Operation does not require a Column, since data is grouped based on table rows. | ||
|
||
.. image:: ../../_static/spreadsheets/group-by.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, this section duplicates content from the canonical documentation, which doesn't add any value. I suggest we either:
- Walk them through an example in Excel that matches the pandas example below, so we are comparing apples to apples
- Cut from "Then using the…" through this image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afeld - are you good here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing this again, I see that the requested change was not made. I also agree that having an excel example that matches the pandas example would be desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bhard27 Are you able to update the excel image to match the pandas example below?
doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
Outdated
Show resolved
Hide resolved
# default is axis=0 | ||
grouped = df.groupby("class") | ||
grouped = df.groupby("order", axis="columns") | ||
grouped = df.groupby(["class", "order"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning to a variable doesn't output anything, and even if it did, it would be something not useful like
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f8424dc3d90>
Let's do groupby().mean()
to show something more interesting.
In Excel, this can be done by using the `Query Editor <https://support.microsoft.com/en-us/office/introduction-to-the-query-editor-power-query-1d6cdb63-bf70-4ae8-a7d5-6ae9547004d9>`_. | ||
You can group the values in various rows into a single value by grouping the rows according to the values in one or more columns | ||
Then using the `Query Editor ribbon <https://support.microsoft.com/en-us/office/combine-data-from-multiple-data-sources-power-query-70cfe661-5a2a-4d9d-a4fe-586cc7878c7d>`_, Right-click the column header that you want to group on, | ||
and click `Group By <https://support.microsoft.com/en-us/office/group-rows-in-a-table-power-query-e1b9e916-6fcc-40bf-a6e8-ef928240adf1>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see Excel changing where to find the Query Editor, so let's try and future-proof this documentation by not getting into those details, directing readers to the canonical Excel documentation instead.
], | ||
index=["falcon", "parrot", "lion", "monkey", "leopard"], | ||
columns=("class", "order", "max_speed"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an outstanding task in #38990 around standardizing the example datasets used in the page. Totally fine to take care of that as follow-up; just noting.
Oh, one more thing:
|
…eets.rst Co-authored-by: Aidan Feldman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outstanding comments still hold, but I think it's fine to take care of those as follow-ups. I don't have the ability to merge, but 👍 from me.
Also, the build failure seems unrelated to this change. |
GroupBy | ||
------- | ||
|
||
In Excel, this can be done by using the `Query Editor <https://support.microsoft.com/en-us/office/introduction-to-the-query-editor-power-query-1d6cdb63-bf70-4ae8-a7d5-6ae9547004d9>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page says it will be retired soon, perhaps https://support.microsoft.com/en-us/office/about-power-query-in-excel-7104fbee-9e62-4cb9-a02e-5bfb1a6c536a is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - can you merge master, I think it will resolve the CI issue.
@Bhard27 - Are you interested in finishing this up? I think changes look good, can you merge master and will have another look. |
Hey, I have pushed the changes you suggested. Is there anything else that needs to be done in this? Also, I don't have permission to merge this to master. |
@Bhard27 - yes, can you merge the master branch into your PR branch (happy to help with further instructions if you're unfamiliar with doing this, just let me know). We like to do this to run the CI on the most recent changes with master.
Right, that needs to be done by a maintainer. Even maintainers shouldn't merge there own PRs ;) |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@Bhard27 - friendly ping, are you interested in continuing this? |
@Bhard27 I think what you've screenshot is the fact that you are doing a pull request from your branch into master. This branch was last updated on February 28th, and the tests that ran were based on what this patch (which hasn't changed) and master (which has changed) combined together. In general, we'd like to make sure the tests get run on a very recent copy of master, so the request here is to merge the master branch into this PR branch. Doing so will run the tests again on all the updated code. To do this, run the command |
Oh, Thanks a lot for the explanation. I totally understand why we need to merge it again and run the tests. I will do that ASAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for merging @Bhard27 - just the one outstanding request by @afeld. I think it would be good if the pandas example matched the excel example, but okay if it doesn't.
To aggregate a column, select the column to perform the Aggregate Operation on from the Column drop-down. | ||
A Row Operation does not require a Column, since data is grouped based on table rows. | ||
|
||
.. image:: ../../_static/spreadsheets/group-by.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bhard27 Are you able to update the excel image to match the pandas example below?
@Bhard27 - closing this as stale, but reply here if you are interested in continuing and can reopen. |