-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: add Comparison with Excel #38554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d51ad4d
to
78da468
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @afeld - I built this locally and it generally looks good
To answer some questions:
Which whatsnew file should I add to?
It shouldn't be necessary to add one, that's usually for new features / bug and regression fixes
Since spreadsheet software is largely interchangeable/compatible, would it make sense to make the page more general as "Comparison to spreadsheets"?
I think so, perhaps it could be "Comparison with spreadsheets (e.g. Excel)"?
EDIT
I don't think this needs a whatsnew note
If you're new to pandas, you might want to first read through :ref:`10 Minutes to pandas<10min>` | ||
to familiarize yourself with the library. | ||
|
||
As is customary, we import pandas and NumPy as follows: | ||
|
||
.. ipython:: python | ||
|
||
import pandas as pd | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@afeld can you post a rendered picture here of the new docs page here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afeld looks great. ideally if you can add the excel (and pandas) references. am happy to merge (and can keep adding things in later PRs).
|
||
``DataFrame``, worksheet | ||
``Series``, column | ||
``Index``, row headings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also indicate that the row labels themselves are akin to the default RangeIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a mention of RangeIndex
below. That work?
General terminology translation | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. csv-table:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be useful to include .png's here if it helps explain material
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will put something together. Ok as a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
doc/source/getting_started/comparison/comparison_with_excel.rst
Outdated
Show resolved
Hide resolved
doc/source/getting_started/comparison/comparison_with_excel.rst
Outdated
Show resolved
Hide resolved
doc/source/getting_started/comparison/comparison_with_excel.rst
Outdated
Show resolved
Hide resolved
- Format Excel comparison code samples with [blacken-docs](https://github.com/asottile/blacken-docs) - Fix `SettingWithCopyWarning`s
- Mention apply() in documentation around deriving columns - Simplify code for doing column subtraction in Excel doc
~~~~~~~~~ | ||
|
||
Every ``DataFrame`` and ``Series`` has an ``Index``, which are labels on the *rows* of the data. In | ||
pandas, if no index is specified, a :class:`~pandas.RangeIndex` is used by default (first row = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't quite sure whether to use the ~
in the class/method/function references, so let me know if you want that changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine
|
||
url = ( | ||
"https://raw.github.com/pandas-dev" | ||
"/pandas/master/pandas/tests/io/data/csv/tips.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this code from another file and kept it consistent, but it wasn't clear why the URL was being passed in as a tuple rather than a single string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't a tuple, it's still a single string
>>> url = (
... 'a'
... 'b'
... )
>>> url
'ab'
I presume it's done like this to keep it from getting too long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that there wasn't a comma! Wasn't aware of implicit string concatenation. Thanks!
doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
Outdated
Show resolved
Hide resolved
doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
Outdated
Show resolved
Hide resolved
doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Marco Gorelli <[email protected]>
~~~~~~~~~ | ||
|
||
Every ``DataFrame`` and ``Series`` has an ``Index``, which are labels on the *rows* of the data. In | ||
pandas, if no index is specified, a :class:`~pandas.RangeIndex` is used by default (first row = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine
thanks @afeld very nice. built docs will appear here: https://pandas.pydata.org/pandas-docs/dev/ (may take a little for the CI to do it). happy to have a followup as discussed. |
minor point: https://pandas.pydata.org/pandas-docs/dev/getting_started/comparison/comparison_with_spreadsheets.html#pivot-tables you can add |
passesblack pandas
passesgit diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entryBackground
I teach a class on pandas for public policy students, and for many of them, spreadsheets are the only point of reference they have for working with tabular data. It would be very helpful to have (official) document comparing the two to point them to.
This is my first contribution to pandas and first time using reStructuredText, so feedback welcome. Thanks in advance!
TODOs
Making a running checklist to show what I've done already, and what else I plan to do. Hoping for some preliminary feedback (like is there still interest in having this page) before spending too much more time on it.
Happy to continue in this pull request until complete with all of them, or get this merged sooner than later and take care of the others in follow-up pull requests. Slight preference for the latter (some documentation being better than none, less to review at once, etc.), but open to whatever.
read_excel()
tips
datasetQuestions
doc/source/_static
is in the.gitignore
, but there are files checked into that folder. Is that intentional? - CLN: remove duplicate banklist.html file #38739