Skip to content

DOC: better document Dtypes docstrings + avoid sphinx warnings #26067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Apr 12, 2019

  • Use class_without_autosummary template for the dtypes (similar to what CategoricalDtype already did) to avoid a bunch of sphinx warnings
  • Therefore, explicitly list some attributes in the docstrings (for now, I went with only documenting the dtype-specific "metadata")
  • And while at it, I also gave the docstrings an update (documenting the parameters, etc)
  • And for doing the above, turned some metadata attributes of the dtypes into properties, so they 1) can be documented (have a docstring) b) cannot be set by the user (xref Dtype attributes (eg PeriodDtype.freq) should not be settable #26096)

@jorisvandenbossche jorisvandenbossche added Docs Dtype Conversions Unexpected or buggy dtype conversions labels Apr 12, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.25.0 milestone Apr 12, 2019
@codecov
Copy link

codecov bot commented Apr 15, 2019

Codecov Report

Merging #26067 into master will decrease coverage by 51.2%.
The diff coverage is 85.71%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #26067       +/-   ##
===========================================
- Coverage   91.95%   40.75%   -51.21%     
===========================================
  Files         175      175               
  Lines       52427    52514       +87     
===========================================
- Hits        48208    21400    -26808     
- Misses       4219    31114    +26895
Flag Coverage Δ
#multiple ?
#single 40.75% <85.71%> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/sparse.py 40.47% <ø> (-51.84%) ⬇️
pandas/core/arrays/integer.py 44.18% <100%> (-52.15%) ⬇️
pandas/core/dtypes/dtypes.py 73.86% <85%> (-22.49%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.37%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.16%) ⬇️
... and 135 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9c9e2f...8818681. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 15, 2019

Codecov Report

Merging #26067 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26067      +/-   ##
==========================================
- Coverage   91.99%   91.98%   -0.01%     
==========================================
  Files         175      175              
  Lines       52383    52394      +11     
==========================================
+ Hits        48188    48196       +8     
- Misses       4195     4198       +3
Flag Coverage Δ
#multiple 90.54% <100%> (ø) ⬆️
#single 40.74% <60%> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/sparse.py 92.3% <ø> (ø) ⬆️
pandas/core/dtypes/dtypes.py 96.42% <100%> (+0.08%) ⬆️
pandas/core/arrays/integer.py 96.34% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.71% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f53bb06...c8ae0c9. Read the comment docs.

@jorisvandenbossche
Copy link
Member Author

@jreback could you take a look at this, if you have some time? (specifically the actual code changes in the dtypes, as this might have some implications for pickling)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, just being sure we actually have enough coverage for pickling of these

def __setstate__(self, state):
# for pickle compat.
self._freq = state['freq']

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a round-trip on this test? shouldn't you also need __getstate__? or maybe not as you are setting _freq which is now good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__getstate__ is implemented in the super class:

def __getstate__(self):
# pickle support; we don't want to pickle the cache
return {k: getattr(self, k, None) for k in self._metadata}

Added in 154a647

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's the round trip test that was failing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, what if you move _freq into metdata then? (and so on for the other subclasses)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have been thinking about that as well. _metadata is also used for other things though (that are included in the EA interface, this __getstate__ is only for our internal dtypes). Eg for equality. For equality it would not matter if freq or _freq is included, but if we would want to use it eg also for a default repr, then it need to be the public attributes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah they are coupled. i don't really like having to be explicit about the pickle compat here.....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really like having to be explicit about the pickle compat here.....

Yep, I agree. Although it is consistent with how it is already done in the other dtypes (Categorical actually has the same), it would be nice to clean it up. I could also add a different _metadata_pickles list, so we can handle both getstate and setstate in the PandasExtensionDtype superclass. Of course that doesn't change the implementation, but it does reduce some duplication in the code.

@jorisvandenbossche
Copy link
Member Author

why exactly is this needed? e

It's also doing the same as what is already done in DatetimeTZDtype:

def __setstate__(self, state):
# for pickle compat.
self._tz = state['tz']
self._unit = state['unit']

I was only wondering if we need to worry about old pickle compat, as the above code I am showing was done in the initial implementation of DatetimeTZDtype, while for PeriodDtype that I am changing here, we already have released versions without this.

@jreback
Copy link
Contributor

jreback commented Apr 15, 2019

right, see my comment above, I think that this might just work if you move the private attributes into _metadata (and leave the accessors as propertys)

@jorisvandenbossche
Copy link
Member Author

@jreback are you fine with merging this as is?
It is consistent with how the existing CategoricalDtype and DatetimeTZDtype handle pickling compat, so I think it is fine. We could try to clean it up a bit by moving some part to the base PandasExtensionDtype, but that could also be done separate.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment otherwise lgtm

@@ -963,6 +1016,10 @@ def __eq__(self, other):
from pandas.core.dtypes.common import is_dtype_equal
return is_dtype_equal(self.subtype, other.subtype)

def __setstate__(self, state):
# for pickle compat.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think would add a bit more comment here. (in all of the setstates to indicate why this is needed, maybe reference this issue)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, expanded the comment!

@jreback
Copy link
Contributor

jreback commented Apr 19, 2019

something faiking merge on green

@jreback jreback merged commit 3face07 into pandas-dev:master Apr 20, 2019
@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the doc-dtype-class-docstrings branch April 22, 2019 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants