Skip to content

Incorrect join of DataFrames with non-unique datetime indices #1306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leonbaum opened this issue May 24, 2012 · 3 comments
Closed

Incorrect join of DataFrames with non-unique datetime indices #1306

leonbaum opened this issue May 24, 2012 · 3 comments
Labels
Milestone

Comments

@leonbaum
Copy link

I'm not sure whether joining of DFs with non-unique indices is now supported, but it's not giving an error and this simple example don't make sense:

In [11]: df1 = pandas.DataFrame({'x': ['a']}, index=[np.datetime64('2012')])

In [12]: df2 = pandas.DataFrame({'y': ['b', 'c']}, index=[np.datetime64('2012')] * 2)

In [13]: df1
Out[13]: 
                     x
1970-01-16 08:09:36  a

In [14]: df2
Out[14]: 
                     y
1970-01-16 08:09:36  b
1970-01-16 08:09:36  c

In [15]: df1.join(df2, how='inner')
Out[15]: 
                     x  y
1970-01-16 08:09:36  a  b

Shouldn't the 1st row of df1 join to both rows of df2?

@leonbaum
Copy link
Author

I just noticed the timestamp is also screwed up, but I'm guessing that's a separate issue.

I'm using the latest master branch, btw.

@wesm
Copy link
Member

wesm commented May 24, 2012

It looks to me like an edge case, I'll look into it. I'll fix the timestamp issue too; unfortunately the NumPy datetime API is a disaster in NumPy 1.6.1 and I'm doing my best to work around it. Affairs will be much improved in NumPy 1.7 and later

@wesm
Copy link
Member

wesm commented May 25, 2012

I worked through this and built the many-to-one and many-to-many join machinery today for indexes. Was not easy:

In [6]: df1.join(df2)
Out[6]: 
                     x  y
1970-01-16 08:09:36  a  b
1970-01-16 08:09:36  a  c

The matter of the timestamp handling is something separate, so closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants