Skip to content

Pandas doesn't accept dtype=np.datetime64 #8004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Wilfred opened this issue Aug 12, 2014 · 15 comments
Closed

Pandas doesn't accept dtype=np.datetime64 #8004

Wilfred opened this issue Aug 12, 2014 · 15 comments
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas

Comments

@Wilfred
Copy link
Contributor

Wilfred commented Aug 12, 2014

In [1]: from pandas import Series

In [3]: import numpy as np  

In [4]: Series([], dtype=np.datetime64)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-139e5646d234> in <module>()
----> 1 Series([], dtype=np.datetime64)

/users/is/whughes/pyenvs/1c4f50ee0e346415/lib/python2.7/site-packages/pandas-0.13.1-py2.7-linux-x86_64.egg/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    218             else:
    219                 data = _sanitize_array(data, index, dtype, copy,
--> 220                                        raise_cast_failure=True)
    221 
    222                 data = SingleBlockManager(data, index, fastpath=True)

/users/is/whughes/pyenvs/1c4f50ee0e346415/lib/python2.7/site-packages/pandas-0.13.1-py2.7-linux-x86_64.egg/pandas/core/series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2564 
   2565     else:
-> 2566         subarr = _try_cast(data, False)
   2567 
   2568     # scalar like

/users/is/whughes/pyenvs/1c4f50ee0e346415/lib/python2.7/site-packages/pandas-0.13.1-py2.7-linux-x86_64.egg/pandas/core/series.py in _try_cast(arr, take_fast_path)
   2509 
   2510         try:
-> 2511             arr = _possibly_cast_to_datetime(arr, dtype)
   2512             subarr = pa.array(arr, dtype=dtype, copy=copy)
   2513         except (ValueError, TypeError):

/users/is/whughes/pyenvs/1c4f50ee0e346415/lib/python2.7/site-packages/pandas-0.13.1-py2.7-linux-x86_64.egg/pandas/core/common.py in _possibly_cast_to_datetime(value, dtype, coerce)
   1659                 else:
   1660                     raise TypeError(
-> 1661                         "cannot convert datetimelike to dtype [%s]" % dtype)
   1662             elif is_timedelta64 and dtype != _TD_DTYPE:
   1663                 if dtype.name == 'timedelta64[ns]':

TypeError: cannot convert datetimelike to dtype [datetime64]

The following workaround works:

Series([], dtype=datetime)

but I would expect passing in the numpy type to work too.

@jreback
Copy link
Contributor

jreback commented Aug 12, 2014

all datetimes are internally (and converted from) kept as datetime64[ns] this doesn't make sense.

and conversions are already way complicated.

You are certainly welcome to try to fix this. but is their an actual usecase?

@cpcloud
Copy link
Member

cpcloud commented Aug 12, 2014

these kinds of dtypes are also parameterized whereas things like int64, float64 etc are not, so it doesn't make sense to pass it in unparameterized.

also empty series of datetimes are very edge-casey what's the motivation?

i would even argue that passing in the dtype to series is unnecessary 95% of the time (pandas is pretty good about doing the right thing), if you need to coerce to datetimes use pd.to_datetime

@Wilfred
Copy link
Contributor Author

Wilfred commented Aug 12, 2014

I wanted to explicitly create a Series that contained datetimes, and the docstring of Series says:

dtype : numpy.dtype or None
     If None, dtype will be inferred

so I assumed that I could pass in a numpy type. Perhaps it's just a matter of clarifying the docstring?

Whilst I used an empty series as a minimal reproducible example, this does occur for non-empty series too:

Series([datetime.now()], dtype=np.datetime64) # same error
Series([np.datetime64(datetime.now())], dtype=np.datetime64) # same error

This behaviour does actually work with other numpy dtypes, e.g. Series([], dtype=np.int64), which is what set my expectations.

@cpcloud
Copy link
Member

cpcloud commented Aug 12, 2014

I wanted to explicitly create a Series that contained datetime

you can create and then do pd.to_datetime

do you have non-trivial use case?

would help us evaluate whether this is necessary

@jreback
Copy link
Contributor

jreback commented Aug 12, 2014

@Wilfred this is the point of to_datetime. As I said, you could allow this (their are explicity) check for NOT allowing this. But not sure its relevant/usecase. The doc-string is generic. That's why their is an error.

@ischwabacher
Copy link
Contributor

If all you want is a workaround, note that you can do this:


In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: pd.Series(dtype='m8[ns]')
Out[3]: Series([], dtype: timedelta64[ns])

In [4]: pd.Series(dtype=np.timedelta64(0, 'ns').dtype)
Out[4]: Series([], dtype: timedelta64[ns])

The problem is that numpy doesn't make it easy to get from timedelta64 to timedelta64[ns], but pandas demands nanoseconds.

@cpcloud
Copy link
Member

cpcloud commented Aug 12, 2014

numpy doesn't make it easy to get from timedelta64 to timedelta64[ns]

that's because timedelta64 without a unit doesn't make sense

@cpcloud
Copy link
Member

cpcloud commented Aug 12, 2014

even though for some reason numpy still lets you think that it does

In [18]: np.dtype('m8')
Out[18]: dtype('<m8')

@ischwabacher
Copy link
Contributor

I don't mean a timedelta64 object, I mean the timedelta64 dtype. AFAIK the only paths to the timedelta64[ns] dtype go either through an instance (as in In[4]) or through the string 'm8[ns]'.

@jreback
Copy link
Contributor

jreback commented Aug 12, 2014

we are talking about M8 here, right? (not m8)

@ischwabacher
Copy link
Contributor

Whoops, yes. The point still stands.

@Wilfred
Copy link
Contributor Author

Wilfred commented Aug 12, 2014

OK, perhaps the best solution would be to throw an explicit error about parameterized types, so the user knows how to fix. Something like:

Series([], dtype=np.datetime64)
TypeError: Can't use parameterised dtypes. Choose an concrete dtype (e.g. "datetime64[ns]" instead of "datetime") or let pandas infer the type.

@ischwabacher
Copy link
Contributor

I was under the impression that from pandas's point of view, there is one true datetime64 dtype and that dtype is datetime64[ns], and any other datetime64 dtype should be cast appropriately. So I would actually expect that you would get this:


In [5]: pd.Series([1], dtype='M8')
Out[5]: 
0   1970-01-01 00:00:00.000000001
dtype: datetime64[ns]

In [6]: pd.Series([1], dtype='M8[s]')
Out[6]: 
0   1970-01-01 00:00:01
dtype: datetime64[ns]

In [7]: pd.Series([1], dtype='m8[s]')
Out[7]: 
0   00:00:01
dtype: timedelta64[ns]

In [8]: pd.Series([1], dtype='m8[Y]')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-f94f55f94a5f> in <module>()
----> 1 pd.Series([1], dtype='m8[Y]')
...
TypeError: cannot convert dtype [timedelta64[Y]] to nanoseconds

Currently all of these raise.

@jreback
Copy link
Contributor

jreback commented Aug 12, 2014

The point was to be explicit and make to_datetime(....,unit='....') be the entry point for these. You certainly could accept these (easy, just call to_datetime! where the error is now).

Tests / impl shoulld be straightforward.

Just never had a reason to actually pass anything like this

as usually this is inferred

e.g. Series(np.array([1,2,3],dtype='m8[s]') is ok, the dtype parameter is to coerce the input

which is actually quite tricky because you first have to interpret it and it CAN be ambiguous

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Oct 2, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 2, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jbrockmendel
Copy link
Member

Closed by #23392

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

No branches or pull requests

5 participants