Skip to content

BUG: pandas.tools.util.cartesian_product doesn't work on DatetimeIndex objects #6439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Feb 22, 2014 · 5 comments · Fixed by #6451
Closed

BUG: pandas.tools.util.cartesian_product doesn't work on DatetimeIndex objects #6439

shoyer opened this issue Feb 22, 2014 · 5 comments · Fixed by #6451

Comments

@shoyer
Copy link
Member

shoyer commented Feb 22, 2014

This breaks the new pandas.MultiIndex.from_product function when one of the product arrays is DatetimeIndex. Observe:

>>> import pandas as pd
>>> from pandas.tools.util import cartesian_product
>>> x, y = cartesian_product([[1, 2], pd.date_range('2000-01-01', periods=2)])
>>> print x
[1 1 2 2]
>>> print y.values
['1999-12-31T16:00:00.000000000-0800' '1999-12-31T16:00:00.000000000-0800'
 '2000-01-01T16:00:00.000000000-0800' '2000-01-01T16:00:00.000000000-0800']

If cartesian_product was working properly, the dates should actually be [1999, 2000, 1999, 2000].

The simple fix would be to convert everything into a numpy array before calculating the cartesian product.

@jreback
Copy link
Contributor

jreback commented Feb 22, 2014

In [3]: MultiIndex.from_product([[1, 2], pd.date_range('2000-01-01', periods=2)]).values
Out[3]: 
array([(1, Timestamp('2000-01-01 00:00:00', tz=None)),
       (1, Timestamp('2000-01-01 00:00:00', tz=None)),
       (2, Timestamp('2000-01-02 00:00:00', tz=None)),
       (2, Timestamp('2000-01-02 00:00:00', tz=None))], dtype=object)

In [4]: from pandas.tools.util import cartesian_product

In [5]: >>> x, y = cartesian_product([[1, 2], pd.date_range('2000-01-01', periods=2)])

In [6]: x
Out[6]: array([1, 1, 2, 2])

In [7]: y
Out[7]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01, ..., 2000-01-02]
Length: 4, Freq: None, Timezone: None

In [8]: y.values
Out[8]: 
array(['1999-12-31T19:00:00.000000000-0500',
       '1999-12-31T19:00:00.000000000-0500',
       '2000-01-01T19:00:00.000000000-0500',
       '2000-01-01T19:00:00.000000000-0500'], dtype='datetime64[ns]')

looks ok to me, what is the problem?

@jorisvandenbossche
Copy link
Member

It's the order that is not OK:

In [22]: x, y = cartesian_product([[1, 2], pd.date_range('2000-01-01', periods=2)])
In [23]: x
Out[23]: array([1, 1, 2, 2])
In [25]: y.day
Out[25]: array([1, 1, 2, 2], dtype=int64)

In [26]: x, y = cartesian_product([[1, 2], [1,2]])
In [28]: x
Out[28]: array([1, 1, 2, 2])
In [29]: y
Out[29]: array([1, 2, 1, 2])

See difference between y.day in first case and y in second case.

@jreback
Copy link
Contributor

jreback commented Feb 22, 2014

ahh....I c...ok marking as a bug

@jreback jreback added this to the 0.14.0 milestone Feb 22, 2014
@jreback
Copy link
Contributor

jreback commented Feb 22, 2014

@shoyer pull-request for this?

@shoyer
Copy link
Member Author

shoyer commented Feb 22, 2014

Sure - I'll send something in tomorrow or Monday.

On Sat, Feb 22, 2014 at 6:18 AM, jreback [email protected] wrote:

@shoyer pull-request for this?

Reply to this email directly or view it on GitHub:
#6439 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants