You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I see a mysterious behaviour of merge function in pandas and i don't know if its a bug or because i missed something.
If i use the parameters left_on and right on an how='outer' then a key is missing and relpaced with a NA. If i use the same name for both key columns and use on='key' all the keys will be available. Is this a normal behaviour? Is there any difference between on and left_on/right_on?
Please see the examples below:
# Example 1: NA for key=2a=pd.DataFrame({'aa': [1,3,4,5], 'bb':[1,2,3,4]})
b=pd.DataFrame({'ab': [1,2,3,4,5], 'cc':[1,2,3,4,5]})
pd.merge(a, b, left_on='aa', right_on='ab', how='outer')
# Example 2: no NAa=pd.DataFrame({'aa': [1,3,4,5], 'bb':[1,2,3,4]})
b=pd.DataFrame({'aa': [1,2,3,4,5], 'cc':[1,2,3,4,5]})
pd.merge(a, b, on='aa', how='outer')
The text was updated successfully, but these errors were encountered:
If you join with left_on and right_on both columns are in the output, if they have different names. So you will see NA, where a value is missing. If you join with on, the left and right column must have the same name, so the union is in the output, if you use an outer join.
Your examples are not equivalent. If you change your first example to
a = pd.DataFrame({'aa': [1,3,4,5], 'bb':[1,2,3,4]})
b = pd.DataFrame({'aa': [1,2,3,4,5], 'cc':[1,2,3,4,5]})
pd.merge(a, b, left_on='aa', right_on='aa', how='outer')
Hello, thanks for your answer. Why does the column name matter? The merge is done by matching the columns content right? Not the name, so calling it 'aa' or 'ab' should therefore give the same output?
Yes of course, but left_on and right_on keeps both columns, if they have different names. So you get your result. If both columns have the same name, the union of both columns is in the output.
Just try my example, the result is the same as your second examples result.
Hello,
I see a mysterious behaviour of merge function in pandas and i don't know if its a bug or because i missed something.
If i use the parameters left_on and right on an how='outer' then a key is missing and relpaced with a NA. If i use the same name for both key columns and use on='key' all the keys will be available. Is this a normal behaviour? Is there any difference between on and left_on/right_on?
Please see the examples below:
The text was updated successfully, but these errors were encountered: