You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see, thanks for the quick reply!
I usually read the reference API docs and havent read through the user guide. this appears rather as a footnote in the user guide, and I found this behavior to be unexpected.
note, that in the example, the join is in practice a 1-to-1 join as the keys are unique.
IMHO having this documented in the merge reference to having a parameter to control this behavior would be great.
for instance a reindexing parameter could be handy. it could have the values:
left keep the index from the left dataframe. if this is impossible due to many-on-many then raise error
auto - current behavior: keep index if merging on unique indexes, or reset otherwise
reset - always reset
having the parameter available in merge would be an opportunity to reference this gotcha
in my ML project, this issue caused training data to leak into the test data because I was relying on indexes to keep them apart.
The only way to preserve the left index seems to be to reset_index before the merge and then set_index after, which surely defeats the purpose of the index which is to make joins faster?
method
df.merge
will throw away / reset the index ofdf
. I dont think this is desired result.here's an example:
the result will lose the
very important index
indf
and instead will have a generic0 1 2
index.here is a method based on merge that will preseve the original index (which should be the intended result IMHO)
which outputs
The text was updated successfully, but these errors were encountered: