-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: concatting of Series/DataFrame - handling (not skipping) of empty objects #39122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jorisvandenbossche IIUC the change you're suggesting would change this test to expect object dtype; can you confirm?
|
Yes, that was also discussed on the PR in question: #36115 (comment) |
do we have a running list of where we currently have these behaviors? |
For value-depending behaviour in the concat itself in general, there are several "encoded" in the pandas/pandas/core/dtypes/concat.py Lines 24 to 61 in 9a52a81
But specifically where this happens for empty / non-empty inconsistencies, I don't think we have a running list of those, except for the issue linked in the first bullet point about DataFrames in the top post here |
Some notes on what it would take to change this. AFAICT it boils down to disabling a check in
and flipping a condition in
Doing this breaks 10 tests for me locally:
Of these, 5 were introduced by #47372 which was specifically about restoring the current behavior. i.e. changing this would be a breaking change, but doesn't look like it would be particularly tough to implement. |
Concat of a non-empty dataframe with an empty one resulting in the non-empty is a natural behaviour. Hope you won't change it. Example: |
Follow-up on #38843 and #39035.
Currently, we generally (some exceptions can be considered bugs, I think) do not drop empty objects when concatting DataFrames, but we do explicitly drop empties when concatting Series (in
dtypes/concat.py::concat_compat
, foraxis==0
).We should make this consistent throughout pandas, and generally I would argue for not skipping empties: when not skipping empty objects, the resulting dtype of a concat-operation only depends on the input dtypes, and not on the exact content (the exact values, how many values (shape)). In general we want to get rid of value-dependent behaviour. In the past we discussed this in the context of the certain values (eg presence of NaNs or not), but I think also the shape should not matter (eg when slicing dataframes before concatting, you can get empties or not depending on values).
If people agree on going the way of not skipping empties in
concat
(andappend
, and friends), some different areas of work:concat_compat
to not skip empty nullable EAsSo IMO it's mainly the last bullet point (Series/DataFrame with longer-existing EAs) that requires some more discussion on how we want to change it.
Some illustrative examples:
For Series with basic dtype (int64), int64 + object dtype results in object dtype, but not when the object dtype Series is empty:
For DataFrame, you can see that the int64 + object always gives object (even when one is empty), but for period dtype, the empty object dtype gets ignored:
cc @pandas-dev/pandas-core
The text was updated successfully, but these errors were encountered: