Skip to content

ENH: is_homogeneous #22780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 20, 2018
Merged

Conversation

TomAugspurger
Copy link
Contributor

Split #22325

@jorisvandenbossche suggested moving this off of the BlockManager.

Right now, I've made this public. Do we want that? If so I'll add to api.rst, release note, etc. Otherwise, I'll make it private.

@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

@jorisvandenbossche
Copy link
Member

I would maybe rather go with private for now. Or do think of good practical use cases outside of our internals?

@TomAugspurger
Copy link
Contributor Author

My main use-case would be in when writing a custom scikit-learn estimator. "Can I safely convert this DataFrame to an ndarray without converting to an ndarray of objects". Even then, it's not quite perfect as there are things like converting ints to floats, which may be OK there.

Regardless, let's keep it private for now, and open it up if there's demand..

@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Sep 20, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Sep 20, 2018
@codecov
Copy link

codecov bot commented Sep 20, 2018

Codecov Report

Merging #22780 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22780      +/-   ##
==========================================
+ Coverage   92.17%   92.18%   +<.01%     
==========================================
  Files         169      169              
  Lines       50778    50804      +26     
==========================================
+ Hits        46807    46833      +26     
  Misses       3971     3971
Flag Coverage Δ
#multiple 90.59% <100%> (ø) ⬆️
#single 42.33% <37.5%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.45% <100%> (ø) ⬆️
pandas/core/base.py 97.61% <100%> (+0.01%) ⬆️
pandas/core/frame.py 97.2% <100%> (ø) ⬆️
pandas/core/dtypes/base.py 100% <0%> (ø) ⬆️
pandas/util/testing.py 86.03% <0%> (ø) ⬆️
pandas/core/arrays/categorical.py 95.75% <0%> (+0.01%) ⬆️
pandas/core/dtypes/dtypes.py 96.11% <0%> (+0.03%) ⬆️
pandas/core/common.py 97.44% <0%> (+0.05%) ⬆️
pandas/core/dtypes/common.py 95.02% <0%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c113db...332dbca. Read the comment docs.

@jorisvandenbossche
Copy link
Member

Although some dtypes will still convert to object dtype of course (but that's maybe not your direct concern). Isn't it rather something like "all numeric dtypes" that you need there? (as indeed mixed int and float might not be a problem)

@TomAugspurger
Copy link
Contributor Author

Yes, most likely.

For reference though, this is directly useful within pandas for determining whether we can safely do a cross-section on a DataFrame containing extension dtypes. If they're all the same type we want to take a special path. A PR implementing that is incoming, but this felt separate enough to be standalone.

@jorisvandenbossche
Copy link
Member

For reference though, this is directly useful within pandas for determining whether we can safely do a cross-section on a DataFrame containing extension dtypes

Yes, I know, but that seems more internal usage (therefore my question).

But as you said, can always make it public later if there is demand / use case.

@jreback
Copy link
Contributor

jreback commented Sep 20, 2018

his should be internal only

@@ -288,6 +288,26 @@ def _verify_integrity(self, labels=None, levels=None):
def levels(self):
return self._levels

@property
def _is_homogeneous(self):
"""Whether the levels of a MultiIndex all have the same dtype.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u share docstrings at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're all different enough that sharing would be burdensome.

@TomAugspurger
Copy link
Contributor Author

Does this look OK? I'd like to get back to SparseArray today, which will depend on this and #22785.

>>> DataFrame({"A": [1, 2], "B": [3.0, 4.0]})._is_homogeneous
False

Items with the type but different sizes are considered different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type -> the same type

@jreback
Copy link
Contributor

jreback commented Sep 20, 2018

maybe name _is_homogeneous_type to be consistent?

otherwise lgtm

@TomAugspurger
Copy link
Contributor Author

Unlike is_*_dtype, I don't think there's any ambiguity here as to whether this is an array or dtype that's being checked, so I prefer the shorter name if that's OK.

@TomAugspurger TomAugspurger merged commit 0480f4c into pandas-dev:master Sep 20, 2018
@TomAugspurger TomAugspurger deleted the is_homogenous branch September 20, 2018 16:25
@jreback
Copy link
Contributor

jreback commented Sep 20, 2018

no it IS different, this is like is_mixed_type (and not dype things)

pls change this

@TomAugspurger
Copy link
Contributor Author

Ahh, I see what you think it should be consistent with. Will fix that in #22785.

@jorisvandenbossche
Copy link
Member

If we want to add a suffix, I personally find 'dtype' more logical as 'type', since it is the dtype that is checked to be homogenous

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants