Skip to content

DOC: Add references for different index types + examples for pd.Index #17680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 6, 2017

Conversation

topper-123
Copy link
Contributor

  • [x ] passes git diff upstream/master -u -- "*.py" | flake8 --diff

Added 'See also'-references for different index types to make it easier to find the correct index type when in doubt + added simple examples for pd.Index


See Also
---------
RangeIndex : Index implementing a monotonic integer range (very
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that most of these are self-describing, so you can leave off the description. I would keep the descriptions for

  • RangeIndex
  • MultiIndex
  • IntervalIndex
  • CategoricalIndex (maybe)

The rest can be put on a couple lines likes

DatetimeIndex, PeriodIndex, TimedeltaIndex
Int64Index, UInt64Index, Flaot64Index

since the names tell you pretty much everything you need to know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17680 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17680      +/-   ##
==========================================
+ Coverage   91.23%   91.24%   +<.01%     
==========================================
  Files         163      163              
  Lines       49810    49810              
==========================================
+ Hits        45444    45447       +3     
+ Misses       4366     4363       -3
Flag Coverage Δ
#multiple 89.03% <ø> (+0.02%) ⬆️
#single 40.32% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/range.py 92.83% <ø> (ø) ⬆️
pandas/core/indexes/base.py 96.29% <ø> (ø) ⬆️
pandas/core/indexes/timedeltas.py 91.19% <ø> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.53% <ø> (ø) ⬆️
pandas/core/indexes/numeric.py 97.18% <ø> (ø) ⬆️
pandas/core/indexes/interval.py 92.85% <ø> (ø) ⬆️
pandas/core/indexes/period.py 92.77% <ø> (ø) ⬆️
pandas/core/indexes/multi.py 96.9% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d239d9...723f696. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17680 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17680      +/-   ##
==========================================
+ Coverage   91.23%   91.24%   +<.01%     
==========================================
  Files         163      163              
  Lines       49810    49810              
==========================================
+ Hits        45444    45447       +3     
+ Misses       4366     4363       -3
Flag Coverage Δ
#multiple 89.03% <ø> (+0.02%) ⬆️
#single 40.32% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.9% <ø> (ø) ⬆️
pandas/core/indexes/range.py 92.83% <ø> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.53% <ø> (ø) ⬆️
pandas/core/indexes/period.py 92.77% <ø> (ø) ⬆️
pandas/core/indexes/base.py 96.29% <ø> (ø) ⬆️
pandas/core/indexes/timedeltas.py 91.19% <ø> (ø) ⬆️
pandas/core/indexes/numeric.py 97.18% <ø> (ø) ⬆️
pandas/core/indexes/interval.py 92.85% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d239d9...723f696. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17680 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17680      +/-   ##
==========================================
- Coverage   91.24%   91.22%   -0.03%     
==========================================
  Files         163      163              
  Lines       49967    49967              
==========================================
- Hits        45593    45583      -10     
- Misses       4374     4384      +10
Flag Coverage Δ
#multiple 89.03% <ø> (-0.01%) ⬇️
#single 40.24% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.39% <ø> (ø) ⬆️
pandas/core/indexes/period.py 92.78% <ø> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.48% <ø> (-0.1%) ⬇️
pandas/core/indexes/timedeltas.py 91.19% <ø> (ø) ⬆️
pandas/core/indexes/interval.py 92.85% <ø> (ø) ⬆️
pandas/core/indexes/range.py 92.83% <ø> (ø) ⬆️
pandas/core/indexes/base.py 96.46% <ø> (ø) ⬆️
pandas/core/indexes/numeric.py 97.18% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.74% <0%> (-0.1%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22515f5...7688040. Read the comment docs.

@jreback jreback added the Docs label Sep 26, 2017
memory-efficient)
CategoricalIndex : Index with :class:`Categorical` s.
MultiIndex : A multi-level, or hierarchical, Index
IntervalIndex : an Index of :class:`Interval` s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check locally that those links work inside a 'See also' description ?

Copy link
Contributor Author

@topper-123 topper-123 Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't. I've started building the docs and will to the check when it's built.

Copy link
Contributor Author

@topper-123 topper-123 Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PeriodIndex has no link, others are ok. I assume it's missing in API.rst also?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 26, 2017 via email

See Also
---------
Index : The base pandas Index type
DatetimeIndex : Index with datetime64 data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u also like to Period (like TimedeltaIndex)?

Index : The base pandas Index type
TimeDelta : Represents a duration between two dates or times.
DatetimeIndex : Index with datetime64 data
PeriodIndex : Index holding regular time periods such as years, quarters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer "Index with Period data", detail should be explained in PeriodIndex.

@topper-123
Copy link
Contributor Author

topper-123 commented Sep 27, 2017

I've changed according to the various comment.

In addition, I've also written a section on PeriodIndex in API.rst and links to PeriodIndex are now ok.

EDIT: there were some error for Period in API.rst, which are corrected.

@topper-123
Copy link
Contributor Author

topper-123 commented Sep 27, 2017

One more thing: In the doc string for Int64Index it says: "This is the default index type used by the DataFrame and Series ctors when no explicit index is provided by the user"

I assume this is wrong and the phrase should be moved over to RangeIndex?

EDIT: I've moved that phrase over to RangeIndex. Let me know it you disagree.

@topper-123 topper-123 force-pushed the Index_docs branch 2 times, most recently from b8b68f9 to 3055edb Compare September 27, 2017 06:01
See Also
---------
RangeIndex : Index implementing a monotonic integer range (very
memory-efficient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning that RangeIndex also makes some index operations computationally efficient.

---------
RangeIndex : Index implementing a monotonic integer range (very
memory-efficient)
CategoricalIndex : Index with :class:`Categorical` s.
Copy link
Member

@jschendel jschendel Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: The CategoricalIndex description is structured the same way as IntervalIndex below, but worded slightly differently, "Index with" vs. "an Index of", which could be made consistent.

---------
Index : The base pandas Index type
TimedeltaIndex : Index of timedelta64 data
PeriodIndex : Index with Period data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Slightly different wording between TimedeltaIndex and PeriodIndex ("Index of" vs. "Index with") could be made consistent.

Index : The base pandas Index type
Period : Represents a period of time
DatetimeIndex : Index with datetime64 data
TimedeltaIndex : Index of timedelta64 data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ("Index with" vs "Index of")


RangeIndex is a memory-saving special case of Int64Index limited to
representing monotonic ranges. This is the default index type used
by DataFrame and Series when no explicit index is provided by the user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous RangeIndex comment: could potentially mention that it makes some index operations computationally efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any examples, so I can understand it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since RangeIndex is defined entirely from the start, stop, and step, you can use these values to simply compute some properties in constant time instead of needing to do a pass over the index in linear time, e.g. max, min. Similarly, operations between two RangeIndex can be computed in constant time, e.g. intersection, equals. You also get some properties for free (or constant time) that can be beneficial for index operations, e.g. monotonic, non-null, uniqueness.

Copy link
Contributor Author

@topper-123 topper-123 Sep 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see it can be fasterst index type (also simply by virtue of not having to build a Int64-index everytime a df is created), but I feel more complex advantages should be referenced from the doc string to an explanation elsewere rather than being explained in the doc string itself. I.e. the doc string should only make that statement if there's some further explanation elsewere, e.g. in the user guide, which there isn't ATM (AFAICT).

Maybe you could take a stab at explaining this succintly in the user guide and add a later PR that builds on top of mine?

Copy link
Member

@jschendel jschendel Sep 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to suggest that you add details about it in depth, just merely mention the fact. Like how you're not actually explaining why RangeIndex is memory-saving, you just mention that it is. So just something along the lines of "memory-efficient" -> "memory and computationally efficient" was all I was suggesting. Again, not something I'm dead set on including, just a thought.

@jschendel
Copy link
Member

Added some very small comments. This is fine as-is though; you don't need to go out of your way to address my comments unless you want to.

@jorisvandenbossche
Copy link
Member

One more thing: In the doc string for Int64Index it says: "This is the default index type used by the DataFrame and Series ctors when no explicit index is provided by the user"
I assume this is wrong and the phrase should be moved over to RangeIndex?

Yes that is a left-over from when RangeIndex did not exist.

.. autosummary::
:toctree: generated/

PeriodIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to have this as 'class without autosummary' to not include all the attributes (but listing the different ones as you did below is certainly fine!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from adding the "class_without_autosummary" template, you should also add it to the 'hack' in numpydoc, see the diff of #17642

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done (I think, not sure what these actually do, please review it it's as intended).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that looks good

PeriodIndex.is_leap_year
PeriodIndex.minute
PeriodIndex.month
PeriodIndex.ordinal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PeriodIndex.ordinal does this exist? (doc build gives problems with it, and locally I also don't see it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not there. I'll remove it.

@topper-123 topper-123 force-pushed the Index_docs branch 3 times, most recently from ac31c44 to e88bb31 Compare September 29, 2017 22:52
@topper-123
Copy link
Contributor Author

I've added a comment in RangeIndex regarding computational speed, otherwise unchanged.

Is this ok to accept, or are there reservations?

@jorisvandenbossche jorisvandenbossche merged commit a211b51 into pandas-dev:master Oct 6, 2017
@jorisvandenbossche
Copy link
Member

Yes, thanks a lot @topper-123 !

@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Oct 6, 2017
@topper-123 topper-123 deleted the Index_docs branch October 9, 2017 21:00
ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants