-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Concat multiple different ExtensionArray types #22997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Concat multiple different ExtensionArray types #22997
Conversation
Hello @TomAugspurger! Thanks for submitting the PR.
|
pandas/core/dtypes/base.py
Outdated
return all( | ||
getattr(self, attr) == getattr(other, attr) | ||
for attr in self._metadata | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corner case: other is an instance of a subclass of type(self) and self._metadata != other._metadata
@@ -560,11 +560,6 @@ def _concat_sparse(to_concat, axis=0, typs=None): | |||
|
|||
fill_values = [x.fill_value for x in to_concat | |||
if isinstance(x, SparseArray)] | |||
|
|||
if len(set(fill_values)) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was leftover from an early implementation that didn't handle multiple fill values. And I think @jorisvandenbossche noticed this in his review, but I misunderstood.
Regardless, we do allow concating Series[sparse] for different fill_values, we just change them all to have the first one's fill value (in SparseSeries._concat_same_type
)
This relates to the commits adding hashing (which I was building on top of). I think that's probably up to the EA author to define a |
Codecov Report
@@ Coverage Diff @@
## master #22997 +/- ##
==========================================
+ Coverage 92.14% 92.14% +<.01%
==========================================
Files 170 170
Lines 51016 51015 -1
==========================================
+ Hits 47009 47010 +1
+ Misses 4007 4005 -2
Continue to review full report at Codecov.
|
I had to skip some decimal tests on py2. I couldn't conditionally skip individual tests that had used fixtures defined on the test, e.g. So I opted to skip all the tests on Py2 that had at least one test case failing. |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -602,6 +602,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your | |||
- :meth:`Series.astype` and :meth:`DataFrame.astype` now dispatch to :meth:`ExtensionArray.astype` (:issue:`21185:`). | |||
- Slicing a single row of a ``DataFrame`` with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`) | |||
- Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`) | |||
- Bug in concatenation an Series with two different extension dtypes not casting to object dtype (:issue:`22994`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double back ticks
I think this is good to go. Merging later today. FYI, I'm leaving #22994 open to discuss a more general concat protocol so that types can negotiate on what should happen. I'm hopeful that will clean up all the special concats we have like |
Thanks! |
xref #22994
This builds on #22996, since we need hashing.
2a1660c is the relevant commit.
Sparse tests will fail for now. I'll revisit once SparseArray is in.