-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
0.23.1 concat drops the name of the merge axis when not aligned #21629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think there's a typo in your first example. That said, what is the use case for this? Seems kind of counter-intuitive to me to have two different Index objects with the same name. You could just as easily assign that name after the concat |
Fixed the typo.
The use case can be to keep the name, if you have an identical name. There can be many reasons that for some reason the dataframes you want to concat got somehow mis-aligned. However, I don't know to what extent we have prior art in pandas with regard to keeping the name or not if the indexes are not identical. At least
which seems an indication to me that |
Thanks for fixing the typo. There are many use cases. For us, the most prevalent one is when we deal with large multivariate time-series. We split them by time (the Index) for easier storage and update (typically the last few slices are most frequently updated). The columns are in the thousands, and typically their intersection is at least 99% of their union. When we concatenate these frames, we would like the name of the axis to remain. That said, I was under the false impression that the behavior had changed in 0.23, but it is not the case (I checked many versions from 0.15.0 to 0.22.0). The reason I thought that was, in previous versions, our code was different and just building the index union by itself, then reindex all frames and then only concat (this was faster). As @jorisvandenbossche pointed out, We had to change that part in response to the way 0.23 concat now handles mis-aligned non-concatenating index. I still believe the behavior should be to retain the name of the index during concat. Either take the first one (as |
If the index of multiple df is different, copy the index name of the first df to the df after concat, no matter whether the index of multiple df is different or the index name is different, it is feasible.Of course, it doesn't solve the problem fundamentally. |
Is this the same as #13475? I was working on a PR for that one and it seems to handle this case as well. |
same here on pandas 0.24.2 |
Closing as duplicate of #13475, was fixed with that pr |
Code Sample, a copy-pastable example if possible
When the columns are aligned, no problem, the columns in the result have the correct name ('ID' here).
However, when the columns are not aligned, then the name seems to disappear:
Problem description
When concatenating DataFrames, I expect the non-concatenating axis (the columns axis, in the examples above) to keep its name(s).
An interesting question occurs if the instances of the non-concatenating axis are not only misaligned, but also have different names. In that case, we could use the majority value or drop altogether (None). In our code, we use
names = collections.Counter([df.axes[nc_axis].names for df in objs]).most_common(1)[0][0]
.Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: