Skip to content

merging on dates ignores sort keyword #2098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jseabold opened this issue Oct 21, 2012 · 4 comments
Closed

merging on dates ignores sort keyword #2098

jseabold opened this issue Oct 21, 2012 · 4 comments
Assignees
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@jseabold
Copy link
Contributor

import pandas
import numpy as np

from datetime import datetime

d = {"var1" : np.random.randint(0, 10, size=10), "var2" : np.random.randint(0, 10, size=10), "var3" : [datetime(2012, 1, 12), datetime(2011, 2, 4), datetime(2010, 2, 3), datetime(2012, 1, 12), datetime(2011, 2, 4), datetime(2012, 4, 3), datetime(2012, 3, 4), datetime(2008, 5, 1), datetime(2010, 2, 3), datetime(2012, 2, 3)]}
df = pandas.DataFrame.from_dict(d)
df.ix[:7]
var3 = df.var3.unique()
var3.sort()
new = pandas.DataFrame.from_dict({"var3" : var3, "var8" : np.random.random(7)})
df.merge(new, on="var3", sort=False).ix[:7]
@jseabold
Copy link
Contributor Author

It's also odd that it doesn't preserve the index?

import string

d = {"var1" : np.random.randint(0, 10, size=10), "var2" : np.random.randint(0, 10, size=10), "var3" : [datetime(2012, 1, 12), datetime(2011, 2, 4), datetime(2010, 2, 3), datetime(2012, 1, 12), datetime(2011, 2, 4), datetime(2012, 4, 3), datetime(2012, 3, 4), datetime(2008, 5, 1), datetime(2010, 2, 3), datetime(2012, 2, 3)], "index" : list(string.letters)[:10]}
df = pandas.DataFrame.from_dict(d)
df = df.set_index("index")
df.ix[:7]
var3 = df.var3.unique()
var3.sort()
new = pandas.DataFrame.from_dict({"var3" : var3, "var8" : np.random.random(7)})
df.merge(new, on="var3", sort=False).ix[:7]

@ghost ghost assigned wesm Nov 2, 2012
@wesm
Copy link
Member

wesm commented Nov 2, 2012

From the docstring:

If joining columns on columns, the DataFrame indexes *will be
ignored*. Otherwise if joining indexes on indexes or indexes on a column or
columns, the index will be passed on.

I'm looking into the sort argument business...

wesm added a commit that referenced this issue Nov 2, 2012
@wesm
Copy link
Member

wesm commented Nov 2, 2012

This isn't easy to "fix"-- this actually only causes an issue in the many-to-one merge case that you show here

In [3]: df
Out[3]: 
   var1  var2                 var3
0     9     9  2012-01-12 00:00:00
1     5     9  2011-02-04 00:00:00
2     4     4  2010-02-03 00:00:00
3     1     1  2012-01-12 00:00:00
4     5     0  2011-02-04 00:00:00
5     9     1  2012-04-03 00:00:00
6     2     2  2012-03-04 00:00:00
7     7     6  2008-05-01 00:00:00
8     4     8  2010-02-03 00:00:00
9     1     3  2012-02-03 00:00:00

In [4]: result
Out[4]: 
   var1  var2                 var3      var8
0     9     9  2012-01-12 00:00:00  0.493382
1     1     1  2012-01-12 00:00:00  0.493382
2     5     9  2011-02-04 00:00:00  0.690644
3     5     0  2011-02-04 00:00:00  0.690644
4     4     4  2010-02-03 00:00:00  0.183405
5     4     8  2010-02-03 00:00:00  0.183405
6     9     1  2012-04-03 00:00:00  0.549471
7     2     2  2012-03-04 00:00:00  0.156371
8     7     6  2008-05-01 00:00:00  0.823014
9     1     3  2012-02-03 00:00:00  0.467584

You see the duplicated values are grouped together, but in the order that each distinct value is observed. Contrast this with:

In [5]: result = df.merge(new, on="var3", sort=True)

In [6]: result
Out[6]: 
   var1  var2                 var3      var8
0     7     6  2008-05-01 00:00:00  0.823014
1     4     4  2010-02-03 00:00:00  0.183405
2     4     8  2010-02-03 00:00:00  0.183405
3     5     9  2011-02-04 00:00:00  0.690644
4     5     0  2011-02-04 00:00:00  0.690644
5     9     9  2012-01-12 00:00:00  0.493382
6     1     1  2012-01-12 00:00:00  0.493382
7     1     3  2012-02-03 00:00:00  0.467584
8     2     2  2012-03-04 00:00:00  0.156371
9     9     1  2012-04-03 00:00:00  0.549471

postponing to post-0.9.1 for thinking more about

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

closing as test in

@jreback jreback closed this as completed Sep 21, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

3 participants