-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: is_homogeneous #22780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: is_homogeneous #22780
Conversation
Hello @TomAugspurger! Thanks for submitting the PR.
|
I would maybe rather go with private for now. Or do think of good practical use cases outside of our internals? |
My main use-case would be in when writing a custom scikit-learn estimator. "Can I safely convert this DataFrame to an ndarray without converting to an ndarray of objects". Even then, it's not quite perfect as there are things like converting ints to floats, which may be OK there. Regardless, let's keep it private for now, and open it up if there's demand.. |
Codecov Report
@@ Coverage Diff @@
## master #22780 +/- ##
==========================================
+ Coverage 92.17% 92.18% +<.01%
==========================================
Files 169 169
Lines 50778 50804 +26
==========================================
+ Hits 46807 46833 +26
Misses 3971 3971
Continue to review full report at Codecov.
|
Although some dtypes will still convert to object dtype of course (but that's maybe not your direct concern). Isn't it rather something like "all numeric dtypes" that you need there? (as indeed mixed int and float might not be a problem) |
Yes, most likely. For reference though, this is directly useful within pandas for determining whether we can safely do a cross-section on a DataFrame containing extension dtypes. If they're all the same type we want to take a special path. A PR implementing that is incoming, but this felt separate enough to be standalone. |
Yes, I know, but that seems more internal usage (therefore my question). But as you said, can always make it public later if there is demand / use case. |
his should be internal only |
@@ -288,6 +288,26 @@ def _verify_integrity(self, labels=None, levels=None): | |||
def levels(self): | |||
return self._levels | |||
|
|||
@property | |||
def _is_homogeneous(self): | |||
"""Whether the levels of a MultiIndex all have the same dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u share docstrings at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're all different enough that sharing would be burdensome.
Does this look OK? I'd like to get back to SparseArray today, which will depend on this and #22785. |
pandas/core/frame.py
Outdated
>>> DataFrame({"A": [1, 2], "B": [3.0, 4.0]})._is_homogeneous | ||
False | ||
|
||
Items with the type but different sizes are considered different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type -> the same type
maybe name _is_homogeneous_type to be consistent? otherwise lgtm |
Unlike |
no it IS different, this is like is_mixed_type (and not dype things) pls change this |
Ahh, I see what you think it should be consistent with. Will fix that in #22785. |
If we want to add a suffix, I personally find 'dtype' more logical as 'type', since it is the dtype that is checked to be homogenous |
Split #22325
@jorisvandenbossche suggested moving this off of the BlockManager.
Right now, I've made this public. Do we want that? If so I'll add to api.rst, release note, etc. Otherwise, I'll make it private.