Skip to content

DOC: add Comparison with Excel #38554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Dec 28, 2020
Merged

DOC: add Comparison with Excel #38554

merged 18 commits into from
Dec 28, 2020

Conversation

afeld
Copy link
Member

@afeld afeld commented Dec 18, 2020

Background

I teach a class on pandas for public policy students, and for many of them, spreadsheets are the only point of reference they have for working with tabular data. It would be very helpful to have (official) document comparing the two to point them to.

This is my first contribution to pandas and first time using reStructuredText, so feedback welcome. Thanks in advance!

TODOs

Making a running checklist to show what I've done already, and what else I plan to do. Hoping for some preliminary feedback (like is there still interest in having this page) before spending too much more time on it.

Happy to continue in this pull request until complete with all of them, or get this merged sooner than later and take care of the others in follow-up pull requests. Slight preference for the latter (some documentation being better than none, less to review at once, etc.), but open to whatever.

Questions

@afeld afeld force-pushed the excel branch 2 times, most recently from d51ad4d to 78da468 Compare December 18, 2020 06:35
@MarcoGorelli MarcoGorelli self-requested a review December 18, 2020 10:58
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeld - I built this locally and it generally looks good

To answer some questions:

Which whatsnew file should I add to?

It shouldn't be necessary to add one, that's usually for new features / bug and regression fixes

Since spreadsheet software is largely interchangeable/compatible, would it make sense to make the page more general as "Comparison to spreadsheets"?

I think so, perhaps it could be "Comparison with spreadsheets (e.g. Excel)"?

EDIT

I don't think this needs a whatsnew note

Comment on lines +1 to +9
If you're new to pandas, you might want to first read through :ref:`10 Minutes to pandas<10min>`
to familiarize yourself with the library.

As is customary, we import pandas and NumPy as follows:

.. ipython:: python

import pandas as pd
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@MarcoGorelli MarcoGorelli added this to the 1.3 milestone Dec 19, 2020
@jreback
Copy link
Contributor

jreback commented Dec 21, 2020

@afeld can you post a rendered picture here of the new docs page here

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afeld looks great. ideally if you can add the excel (and pandas) references. am happy to merge (and can keep adding things in later PRs).


``DataFrame``, worksheet
``Series``, column
``Index``, row headings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also indicate that the row labels themselves are akin to the default RangeIndex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a mention of RangeIndex below. That work?

General terminology translation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. csv-table::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be useful to include .png's here if it helps explain material

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will put something together. Ok as a follow-up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@afeld
Copy link
Member Author

afeld commented Dec 26, 2020

Screenshot of the new page screenshot of comparison to Excel page

Let me know what you think! Nudge on the questions up top. I know there are a lot of commits in here; let me know if you want me to squash.

More I want to do with the page, but hoping this is close to being merge-able.

~~~~~~~~~

Every ``DataFrame`` and ``Series`` has an ``Index``, which are labels on the *rows* of the data. In
pandas, if no index is specified, a :class:`~pandas.RangeIndex` is used by default (first row = 0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't quite sure whether to use the ~ in the class/method/function references, so let me know if you want that changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine


url = (
"https://raw.github.com/pandas-dev"
"/pandas/master/pandas/tests/io/data/csv/tips.csv"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this code from another file and kept it consistent, but it wasn't clear why the URL was being passed in as a tuple rather than a single string.

Copy link
Member

@MarcoGorelli MarcoGorelli Dec 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a tuple, it's still a single string

>>> url = (
...     'a'
...     'b'
... )
>>> url
'ab'

I presume it's done like this to keep it from getting too long

Copy link
Member Author

@afeld afeld Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that there wasn't a comma! Wasn't aware of implicit string concatenation. Thanks!

~~~~~~~~~

Every ``DataFrame`` and ``Series`` has an ``Index``, which are labels on the *rows* of the data. In
pandas, if no index is specified, a :class:`~pandas.RangeIndex` is used by default (first row = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine

@jreback jreback merged commit 26a679a into pandas-dev:master Dec 28, 2020
@jreback
Copy link
Contributor

jreback commented Dec 28, 2020

thanks @afeld very nice. built docs will appear here: https://pandas.pydata.org/pandas-docs/dev/ (may take a little for the CI to do it). happy to have a followup as discussed.

@afeld afeld deleted the excel branch December 29, 2020 00:08
@jreback
Copy link
Contributor

jreback commented Dec 30, 2020

minor point: https://pandas.pydata.org/pandas-docs/dev/getting_started/comparison/comparison_with_spreadsheets.html#pivot-tables

you can add margins=True to have the exact excel pivot table behavior you are showing in the png.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants