-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: concat of Series of EA and other dtype fails #20840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,6 +64,31 @@ def test_concat_mixed_dtypes(self, data): | |
expected = pd.concat([df1.astype('object'), df2.astype('object')]) | ||
self.assert_frame_equal(result, expected) | ||
|
||
result = pd.concat([df1['A'], df2['A']]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this just for axis=1? I would make this a separate test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The bug is just with For Though I don't see any tests for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, thought we already had some, but indeed, apparently not. Will add some. |
||
expected = pd.concat([df1['A'].astype('object'), | ||
df2['A'].astype('object')]) | ||
self.assert_series_equal(result, expected) | ||
|
||
def test_concat_columns(self, data, na_value): | ||
df1 = pd.DataFrame({'A': data[:3]}) | ||
df2 = pd.DataFrame({'B': [1, 2, 3]}) | ||
|
||
expected = pd.DataFrame({'A': data[:3], 'B': [1, 2, 3]}) | ||
result = pd.concat([df1, df2], axis=1) | ||
self.assert_frame_equal(result, expected) | ||
result = pd.concat([df1['A'], df2['B']], axis=1) | ||
self.assert_frame_equal(result, expected) | ||
|
||
# non-aligned | ||
df2 = pd.DataFrame({'B': [1, 2, 3]}, index=[1, 2, 3]) | ||
expected = pd.DataFrame({ | ||
'A': data._from_sequence(list(data[:3]) + [na_value]), | ||
'B': [np.nan, 1, 2, 3]}) | ||
result = pd.concat([df1, df2], axis=1) | ||
self.assert_frame_equal(result, expected) | ||
result = pd.concat([df1['A'], df2['B']], axis=1) | ||
self.assert_frame_equal(result, expected) | ||
|
||
def test_align(self, data, na_value): | ||
a = data[:3] | ||
b = data[2:5] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this very strange that you need to add axis=1 here, I don't think this routine should be call at all if axis=1. This seems like it should be handled at a higher level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis=1
is used all over_concat_compat
. Why do you think it's strange?For the specific case from #20832 it is
axis=0
, but we don't want to reshape to 2d.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the special casing non-consolidateble blocks with
axis=1
is indeed an existing pattern.For categoricals:
pandas/pandas/core/dtypes/concat.py
Lines 220 to 221 in 563a6ad
for datetimetz:
pandas/pandas/core/dtypes/concat.py
Lines 453 to 454 in 563a6ad
Note that this is all rather confusing, as
axis=1
in the "concatting code" actually meansaxis=0
in user facing code (because we store the 1D data in 2D objects):pandas/pandas/core/reshape/concat.py
Lines 312 to 315 in 563a6ad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right this should be changed to use
_get_block_manager_axis
FYI (but can be in future).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so,
_get_block_manager_axis
is for a full Series/DataFrame, the logic here is dtype-dependentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does your respsone have to do with my comment? this is exactly what _get_block_managed_axis does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, my response was not very well phrased, but still, then you will have to explain your comment better :-)
I understood your comment as: "we should use
_get_block_manager_axis
here ", and AFAIK,_get_block_manager_axis
lets you convert an axis of 0/1 to the appropriate number for blocks (so for DataFrame it switches 0 and 1). But here, I don't convert anyaxis
argument, I reshape data depending on theaxis
value (and depending on the dtype).So I don't fully understand where in the above code lines I would use
_get_block_manager_axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a more correct usage, though prob doesn't simplify code much. this is spagetti anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that we certainly agree! :-)