-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
How should DataFrame.append
behave related to indexes types?
#22957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the rule is that appending any two different types of indexes ends up with The specific example you showed, Index and PeriodIndex is a definite bug. |
I don't know much about internals, but digging into the bug, this is the code being run:
Where,
So it looks like |
FYI, I suspect that all the PeriodIndex ones will be solved by #22862 (which will be in 0.24) |
Thanks, will take that into account (: |
so this is tested independently (though i suspect it’s not fully tested for a cartesian product of Indexes) look at Index.append (and subclasses of Index) |
Will take a look :). Also believe that some of the errors are raised from functions like Index.union and Index.difference. |
so if I'm reading this right, the reason why I am getting that exact example error from the first post when running append, is because I have columns between my two frames that have the same name but contain different types of data? EDIT: Is there an easy way to find out what the offending columns are? I tried casting every column in my two dataframes to strings (resulting in dtype beig an object) but that didn't seem to fix it. My use case is two seperate CSV's with intersecting columns as well as unique columns that I want to append by stacking them together into one csv while retaining the unique columns from each one and filling them with None/Nan/Null for the rows from the opposing csv. e.g df1:
df2:
merged df:
Except in my case obviously there are many more shared and unique columns (~15 shared columns, 10 unique columns each, total of 35 columns in merged csv/dataframe) |
The example with index and PeriodIndex looks to work on master. Could use a test
|
I'm currently working on a refactoring the code
DataFrame.append
(#22915). One question that came up was, what should be the behavior when appending DataFrames with different index types.To simplify this, let's put the focus on the index itself, and assume that it is valid for both rows and columns (IMHO consistency between the two might be good).
Code Sample
Current Behavior
Index 0 (rows)
All types of indexes work together, except for:
CategoricalIndex + {another}
When the types don't match, the result is cast to
object
.There's a also a bug that happend when appending MultiIndex + DatetimeIndex. I will verify it more in detail later and raise an issue.
Index 1 (columns)
All types of indexes work together, except for:
IntervalIndex + {another}
MultiIndex + {another}
{another} + PeriodIndex
{another} + IntervalIndex
{another} + MultiIndex
DatetimeIndex + {another}
(works fordf.append(series)
)TimedeltaIndex + {numeric}
(works fordf.append(series)
)PeriodIndex + {another}
(works fordf.append(series)
)The kinds of exceptions raised here are the most varied and sometimes very cryptic.
For example:
Suggestion
Since index 0 already allows almost all types of index the be concatenated, we don't want to restrict this behavior. A suggestion I give is to allow any types of indexes to be merged together (and possibly raise a RuntimeWarning when a cast is necessary).
Foe example, joining two indexes of the same type should produce the same type (as the current behavior):
Joining two indexes of different types usually upcasts to
object
(only when necessary)Empty indexes
I believe that empty indexes (e.g. those created in a
DataFrame
with no rows or columns) should be ignored when calculating the final dtype of an index merge. This seems like it is already the current behavior:However, when an empty index has a dtype different from
object
, we may want to preserve it (as it may have been created explicitly by the user).Sorry if this was too long, this is being my first contribution to open source and I still didn't get the hang of how things work. Any suggestions, whether related to the issue or meta, are welcome!
The text was updated successfully, but these errors were encountered: