ENH: is_homogeneous #22780

TomAugspurger · 2018-09-20T11:53:14Z

@jorisvandenbossche suggested moving this off of the BlockManager.

Right now, I've made this public. Do we want that? If so I'll add to api.rst, release note, etc. Otherwise, I'll make it private.

pep8speaks · 2018-09-20T11:53:19Z

Hello @TomAugspurger! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/base.py !
There are no PEP8 issues in the file pandas/core/frame.py !
There are no PEP8 issues in the file pandas/core/indexes/multi.py !
There are no PEP8 issues in the file pandas/tests/frame/test_dtypes.py !
There are no PEP8 issues in the file pandas/tests/indexing/test_multiindex.py !
There are no PEP8 issues in the file pandas/tests/series/test_dtypes.py !

jorisvandenbossche · 2018-09-20T12:28:32Z

I would maybe rather go with private for now. Or do think of good practical use cases outside of our internals?

TomAugspurger · 2018-09-20T12:49:12Z

My main use-case would be in when writing a custom scikit-learn estimator. "Can I safely convert this DataFrame to an ndarray without converting to an ndarray of objects". Even then, it's not quite perfect as there are things like converting ints to floats, which may be OK there.

Regardless, let's keep it private for now, and open it up if there's demand..

codecov · 2018-09-20T12:54:26Z

Codecov Report

Merging #22780 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22780      +/-   ##
==========================================
+ Coverage   92.17%   92.18%   +<.01%     
==========================================
  Files         169      169              
  Lines       50778    50804      +26     
==========================================
+ Hits        46807    46833      +26     
  Misses       3971     3971

Flag	Coverage Δ
#multiple	`90.59% <100%> (ø)`	⬆️
#single	`42.33% <37.5%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`95.45% <100%> (ø)`	⬆️
pandas/core/base.py	`97.61% <100%> (+0.01%)`	⬆️
pandas/core/frame.py	`97.2% <100%> (ø)`	⬆️
pandas/core/dtypes/base.py	`100% <0%> (ø)`	⬆️
pandas/util/testing.py	`86.03% <0%> (ø)`	⬆️
pandas/core/arrays/categorical.py	`95.75% <0%> (+0.01%)`	⬆️
pandas/core/dtypes/dtypes.py	`96.11% <0%> (+0.03%)`	⬆️
pandas/core/common.py	`97.44% <0%> (+0.05%)`	⬆️
pandas/core/dtypes/common.py	`95.02% <0%> (+0.08%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c113db...332dbca. Read the comment docs.

jorisvandenbossche · 2018-09-20T12:54:52Z

Although some dtypes will still convert to object dtype of course (but that's maybe not your direct concern). Isn't it rather something like "all numeric dtypes" that you need there? (as indeed mixed int and float might not be a problem)

TomAugspurger · 2018-09-20T12:58:59Z

Yes, most likely.

For reference though, this is directly useful within pandas for determining whether we can safely do a cross-section on a DataFrame containing extension dtypes. If they're all the same type we want to take a special path. A PR implementing that is incoming, but this felt separate enough to be standalone.

jorisvandenbossche · 2018-09-20T13:02:01Z

For reference though, this is directly useful within pandas for determining whether we can safely do a cross-section on a DataFrame containing extension dtypes

Yes, I know, but that seems more internal usage (therefore my question).

But as you said, can always make it public later if there is demand / use case.

jreback · 2018-09-20T13:15:08Z

his should be internal only

jreback · 2018-09-20T13:15:47Z

pandas/core/indexes/multi.py

@@ -288,6 +288,26 @@ def _verify_integrity(self, labels=None, levels=None):
    def levels(self):
        return self._levels

+    @property
+    def _is_homogeneous(self):
+        """Whether the levels of a MultiIndex all have the same dtype.


can u share docstrings at all?

I think they're all different enough that sharing would be burdensome.

TomAugspurger · 2018-09-20T14:40:34Z

Does this look OK? I'd like to get back to SparseArray today, which will depend on this and #22785.

jorisvandenbossche · 2018-09-20T15:05:03Z

pandas/core/frame.py

+        >>> DataFrame({"A": [1, 2], "B": [3.0, 4.0]})._is_homogeneous
+        False
+
+        Items with the type but different sizes are considered different


the type -> the same type

jreback · 2018-09-20T15:27:43Z

maybe name _is_homogeneous_type to be consistent?

otherwise lgtm

TomAugspurger · 2018-09-20T15:49:26Z

Unlike is_*_dtype, I don't think there's any ambiguity here as to whether this is an array or dtype that's being checked, so I prefer the shorter name if that's OK.

jreback · 2018-09-20T16:29:31Z

no it IS different, this is like is_mixed_type (and not dype things)

pls change this

TomAugspurger · 2018-09-20T16:45:28Z

Ahh, I see what you think it should be consistent with. Will fix that in #22785.

jorisvandenbossche · 2018-09-20T16:47:48Z

If we want to add a suffix, I personally find 'dtype' more logical as 'type', since it is the dtype that is checked to be homogenous

ENH: is_homogenous

90c76cb

private

a5fef74

TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Sep 20, 2018

TomAugspurger added this to the 0.24.0 milestone Sep 20, 2018

jreback reviewed Sep 20, 2018

View reviewed changes

TomAugspurger mentioned this pull request Sep 20, 2018

Preserve Extension type on cross section #22785

Merged

set comprehension

528bbd1

jorisvandenbossche approved these changes Sep 20, 2018

View reviewed changes

fixed typo

332dbca

TomAugspurger merged commit 0480f4c into pandas-dev:master Sep 20, 2018

TomAugspurger deleted the is_homogenous branch September 20, 2018 16:25

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

ENH: _is_homogeneous (pandas-dev#22780)

4e31fe6

Uh oh!

ENH: is_homogeneous #22780

ENH: is_homogeneous #22780

Uh oh!

Conversation

TomAugspurger commented Sep 20, 2018

Uh oh!

pep8speaks commented Sep 20, 2018

Uh oh!

jorisvandenbossche commented Sep 20, 2018

Uh oh!

TomAugspurger commented Sep 20, 2018

Uh oh!

codecov bot commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jorisvandenbossche commented Sep 20, 2018

Uh oh!

TomAugspurger commented Sep 20, 2018

Uh oh!

jorisvandenbossche commented Sep 20, 2018

Uh oh!

jreback commented Sep 20, 2018

Uh oh!

jreback Sep 20, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Sep 20, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Sep 20, 2018

Uh oh!

jorisvandenbossche Sep 20, 2018

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 20, 2018

Uh oh!

TomAugspurger commented Sep 20, 2018

Uh oh!

jreback commented Sep 20, 2018

Uh oh!

TomAugspurger commented Sep 20, 2018

Uh oh!

jorisvandenbossche commented Sep 20, 2018

Uh oh!

Uh oh!

codecov bot commented Sep 20, 2018 •

edited

Loading