-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF/INT: preserve block type in joining when only having a single block #17954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/INT: preserve block type in joining when only having a single block #17954
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17954 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50113 50121 +8
==========================================
- Hits 45723 45722 -1
- Misses 4390 4399 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17954 +/- ##
==========================================
+ Coverage 91.23% 91.25% +0.01%
==========================================
Files 163 163
Lines 50113 50173 +60
==========================================
+ Hits 45723 45786 +63
+ Misses 4390 4387 -3
Continue to review full report at Codecov.
|
pandas/core/internals.py
Outdated
if len(join_units) == 1 and not join_units[0].indexers: | ||
b = join_units[0].block | ||
values = b.values | ||
if copy: # and values.base is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I first had this with and values.base is not None:
but that failed one test in merge
. But I copied that logic from concatenate_join_units
:
Not sure why there it is needed, but here it fails.
@jreback can you have a look at this one? (would like to merge it for 0.21.0, although it is messing with the internals after the rc. But I think you have a better idea on how safe this change is) |
pandas/core/internals.py
Outdated
values = b.values | ||
if copy: # and values.base is not None: | ||
values = values.copy() | ||
elif not copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for this special case? IOW, why can't you handle this in concat_same_type
(maybe pass in copy
flag if that helps), IOW you can test there that you only have 1 block and write the same code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the separate special case because it occurs in different cases (and so how it is here, is more close to the original code), but could add it to the other special case. But, that makes concat_same_type
more complex. Currently it does not need to bother about copying the data or not in this special case of to_concat
values of 1, because concatenate always takes a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I guess. Still like to come back in and re-org this to make as clear as possible during next release. Do appreciate the support for CustomBlocks, though worry we are adding even more technical debt here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah what I would actually do is define a _concatentor
for the base case (e.g. where we have staticmethod(np.concatenate)
; in pandas/core/dtypes/concat.py
, call it _concat_arrays
or something. This (and you would have to modify all the others to take a copy=
kwarg, then this code become much simpler & less special cased.
Then all of the concat logic is in a single place and copies are handled there. Pls create an issue for 0.22 for this (you may already have one).
Added to the 0.21 milestone, but will leave up to @jreback to decide if it should be pushed. |
Opened issue at #17997 |
lgtm. |
Appveyor finished, but hasn't updated the checkmark yet. Merging. |
…ck (pandas-dev#17954) * REF/INT: preserve block type in joining when only having a single block * add test * remove checking of values.base * clean-up comment
…ck (pandas-dev#17954) * REF/INT: preserve block type in joining when only having a single block * add test * remove checking of values.base * clean-up comment
xref #17283