Skip to content

BUG: When using np.datetime64 should have date ranges according to frequency #4066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timcera opened this issue Jun 28, 2013 · 12 comments
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas

Comments

@timcera
Copy link
Contributor

timcera commented Jun 28, 2013

A daily frequency np.datetime64 should have a range of [2.5e16 BC, 2.5e16 AD]. Instead it follows the range for nanosecond [ 1678 AD, 2262 AD].

Ranges taken from http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

In [126]: pd.Timestamp(np.datetime64('1677-01-01'),'D')
Out[126]: Timestamp('2261-07-22 23:34:33.709551616', tz=None)

In [127]: pd.Timestamp(np.datetime64('1678-01-01'),'D')
Out[127]: Timestamp('1678-01-01 00:00:00', tz=None)

In [128]: pd.Timestamp(np.datetime64('2262-01-01'),'D')
Out[128]: Timestamp('2262-01-01 00:00:00', tz=None)

In [129]: pd.Timestamp(np.datetime64('2263-01-01'),'D')
Out[129]: Timestamp('1678-06-12 00:25:26.290448384', tz=None)

@jreback
Copy link
Contributor

jreback commented Jun 28, 2013

pandas doesn't support these types of date frequencies
is there a reason you actually need this functionality?

@timcera
Copy link
Contributor Author

timcera commented Jun 28, 2013

I have a general purpose astronomic library that currently can use second level frequency and range from around 4700 BCE to sometime in the far future (5000 CE? something like that). It is called Astronomia at https://bitbucket.org/timcera/astronomia. The date data structure, if you can call it that, is very primitive with year, month, day, hour, minute, second passed around as separate scalars, or in my development version separate numpy arrays. That data structure isn't necessarily a bad thing, it gets converted pretty quickly to Julian Day Number or Julian Date that drives almost all of the other calculations.

I thought that Pandas would give me a better environment, but if I move to Pandas, I don't want to lose any functionality.

At a minimum, the frequencies that aren't supported should be documented and raise an error.

Kindest regards,
Tim

@hayd
Copy link
Contributor

hayd commented Jul 10, 2013

@jreback should Timestamps raise here (as out of bounds)?

This seems a little stange/buggy (why are we converting this dtype to a Timestamp?):

In [1]: np.array([np.datetime64('-99999-01-01')], dtype=np.dtype('<M8[D]'))
Out[1]: array(['-99999-01-01'], dtype='datetime64[D]')

In [2]: pd.Series(np.array([np.datetime64('-99999-01-01')], dtype=np.dtype('<M8[D]')))
Out[2]:
0   1713-05-28 22:13:45.461981184
dtype: datetime64[D]

In [3]: pd.Series(np.array([np.datetime64('-99999-01-01')], dtype=np.dtype('<M8[D]')))[0]
Out[3]: Timestamp('1713-05-28 22:13:45.461981184', tz=None)

A workaround is to use the object dtype (obviously not ideal)

In [4]: pd.Series([np.datetime64('-99999-01-01')], dtype=object)
Out[4]:
0    -99999-01-01
dtype: object

@jreback I'm not saying these should be allowable times for Timestamp, but perhaps we (c|sh)ould allow this dtype (i.e. not try and attempt the conversion if the dtype is prescribed... or something?):

In [5]: pd.Series([np.datetime64('-99999-01-01')], dtype=np.dtype('<M8[D]'))
TypeError: cannot convert datetimelike to dtype [datetime64[D]]

weird:

In [6]: pd.Series(np.array([np.datetime64('-99999-01-01')], dtype=np.dtype('<M8[D]')), dtype=object)
Out[6]:
0    1969-12-31 23:59:59.962756
dtype: object

@jreback
Copy link
Contributor

jreback commented Jul 11, 2013

yes, this should raise

@hayd
Copy link
Contributor

hayd commented Jul 11, 2013

Is there a reason this dtype isn't allowed?

Rather than raise TypeError: cannot convert datetimelike to dtype [datetime64[D]] in _possibly_cast_to_datetime...

@jreback
Copy link
Contributor

jreback commented Jul 11, 2013

everything converted internally to datetime64[ns], so punted on the 'conversions' as 1.6.2 too buggy
we don't have a concept of the 'output' format for datetimes, e.g. you might want to do something like

In [2]: s = Series([Timestamp('20130101')])

In [3]: s
Out[3]: 
0   2013-01-01 00:00:00
dtype: datetime64[ns]

then s.astype('datetime64[D]'), but no easy way to do this

@hayd
Copy link
Contributor

hayd commented Jul 11, 2013

@jreback so the problem is converting ns raises / loses data, so it's not as simple as just returning values rather than raising in _possibly_cast_to_datetime? (but maybe raise if coerce=True).

@jreback
Copy link
Contributor

jreback commented Jul 11, 2013

The problem is the conversion wasn't always working, so instead of having a weird result was just not allowing it (as there are many ways to input the dates in any event), just an issue with using np.datetime64

@cancan101
Copy link
Contributor

Also see #4341 and #4337

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

pushing to 0.14. maybe can support different datetime freqs

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@jbrockmendel
Copy link
Member

@jreback closeable?

@jreback
Copy link
Contributor

jreback commented Jul 23, 2019

yes we raise appropriately for out of bounds

@jreback jreback closed this as completed Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

No branches or pull requests

6 participants