Skip to content

series with datetime raises pandas.tslib.OutOfBoundsDatetime on value_counts() #13663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
13ene opened this issue Jul 15, 2016 · 2 comments · Fixed by #13772
Closed

series with datetime raises pandas.tslib.OutOfBoundsDatetime on value_counts() #13663

13ene opened this issue Jul 15, 2016 · 2 comments · Fixed by #13772
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@13ene
Copy link

13ene commented Jul 15, 2016

I expected that for each value a the number of entries would be returned, instead it throws an OutOfBoundsDatetime although it is not pandas object in first place, but a datetime object. I am aware that the pandas timestamps have a limited time span, however the objects are not timestamps, but datetimes... Thanks!

Code Sample, a copy-pastable example if possible

import pandas as pd
import datetime
s = pd.Series([datetime.datetime(9999,1,1)])
s.value_counts()

Expected Output

---------------------------------------------------------------------------
OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-7-1b9293e6eb60> in <module>()
----> 1 s.value_counts()

/usr/lib/python2.7/dist-packages/pandas/core/base.pyc in value_counts(self, normalize, sort, ascending, bins, dropna)
    466         from pandas.tseries.api import DatetimeIndex, PeriodIndex
    467         result = value_counts(self, sort=sort, ascending=ascending,
--> 468                               normalize=normalize, bins=bins, dropna=dropna)
    469 
    470         if isinstance(self, PeriodIndex):

/usr/lib/python2.7/dist-packages/pandas/core/algorithms.pyc in value_counts(values, sort, ascending, normalize, bins, dropna)
    318 
    319         if not isinstance(keys, Index):
--> 320             keys = Index(keys)
    321         result = Series(counts, index=keys, name=name)
    322 

/usr/lib/python2.7/dist-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name, fastpath, tupleize_cols, **kwargs)
    177                         tslib.is_timestamp_array(subarr)):
    178                         from pandas.tseries.index import DatetimeIndex
--> 179                         return DatetimeIndex(subarr, copy=copy, name=name, **kwargs)
    180                     elif (inferred.startswith('timedelta') or
    181                           lib.is_timedelta_array(subarr)):

/usr/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs)
     87                 else:
     88                     kwargs[new_arg_name] = new_arg_value
---> 89             return func(*args, **kwargs)
     90         return wrapper
     91     return _deprecate_kwarg

/usr/lib/python2.7/dist-packages/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    317                 except ValueError:
    318                     # tz aware
--> 319                     subarr = tools._to_datetime(data, box=False, utc=True)
    320 
    321                 # we may not have been able to convert

/usr/lib/python2.7/dist-packages/pandas/tseries/tools.pyc in _to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, freq, infer_datetime_format)
    393         return _convert_listlike(arg, box, format, name=arg.name)
    394     elif com.is_list_like(arg):
--> 395         return _convert_listlike(arg, box, format)
    396 
    397     return _convert_listlike(np.array([ arg ]), box, format)[0]

/usr/lib/python2.7/dist-packages/pandas/tseries/tools.pyc in _convert_listlike(arg, box, format, name)
    381                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    382             except (ValueError, TypeError):
--> 383                 raise e
    384 
    385     if arg is None:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 9999-01-01 00:00:00

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 2.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.1
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None


@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 15, 2016

It looks like the problem comes when we try to construct the index here. Essentially we call Index([datetime.datetime(9999,1,1)]) instead of Index([datetime.datetime(9999,1,1)], dtype=object). Want to see if making that change fixes it without breaking any tests?

Any reason you're using datetime.datetimes with object dtype instead of Periods?

In [4]: pd.Series([pd.Period('9999-01-01')]).value_counts()
Out[4]:
9999-01-01    1
Freq: D, dtype: int64

That might be less painful for you since there will surely be other places in pandas that don't support datetimes with object dtype. It might not be worth fixing this one case, if Periods provide a better alternative.

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions labels Jul 15, 2016
@13ene
Copy link
Author

13ene commented Jul 15, 2016

I read the data from a sql-database via pd.read_sql_query and for values that do not go outside the boundaries the type is datetime64[ns], however infinity is marked as 9999 in some columns and the dtype for those is object.
Thanks for the advice on periods.

@sinhrks sinhrks changed the title series with datetime raises pandas.tslib.OutOfBoundsDatetime on count_values() series with datetime raises pandas.tslib.OutOfBoundsDatetime on value_counts() Jul 24, 2016
@jreback jreback added this to the 0.19.0 milestone Jul 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants