Failing statsmodels tests on pandas master vs. 0.12.0 #5312

jseabold · 2013-10-24T18:46:12Z

https://launchpadlibrarian.net/154849014/buildlog_ubuntu-trusty-i386.statsmodels_0.6.0~ppa18~revno-1486~ubuntu14.04.1_UPLOADING.txt.gz

======================================================================
ERROR: statsmodels.iolib.tests.test_foreign.test_genfromdta_datetime
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/decorators.py", line 146, in skipper_func
    return f(*args, **kwargs)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/iolib/tests/test_foreign.py", line 139, in test_genfromdta_datetime
    pandas=True)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/iolib/foreign.py", line 1049, in genfromdta
    args=(fmtlist[i],))
  File "/usr/lib/pymodules/python2.7/pandas/core/series.py", line 1978, in apply
    return self._constructor(mapped, index=self.index).__finalize__(self)
  File "/usr/lib/pymodules/python2.7/pandas/core/series.py", line 217, in __init__
    data = SingleBlockManager(data, index, fastpath=True)
  File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 3295, in __init__
    block = make_block(block, axis, axis, ndim=1, fastpath=True)
  File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 1806, in make_block
    return klass(values, items, ref_items, ndim=ndim, fastpath=fastpath, placement=placement)
  File "/usr/lib/pymodules/python2.7/pandas/core/internals.py", line 1412, in __init__
    values = tslib.cast_to_nanoseconds(values)
  File "tslib.pyx", line 1453, in pandas.tslib.cast_to_nanoseconds (pandas/tslib.c:22283)
TypeError: Cannot change data-type for object array.

======================================================================
ERROR: statsmodels.tsa.tests.test_arima.test_arma_predict_indices
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/tsa/tests/test_arima.py", line 975, in test_arma_predict_indices
    _check_start(*((model,) + case))
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/tsa/tests/test_arima.py", line 921, in _check_start
    start = model._get_predict_start(given, dynamic)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/tsa/arima_model.py", line 562, in _get_predict_start
    start = super(ARMA, self)._get_predict_start(start)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/tsa/base/tsa_model.py", line 130, in _get_predict_start
    self._set_predict_start_date(start)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/tsa/base/tsa_model.py", line 105, in _set_predict_start_date
    start, self.data.freq)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/tsa/base/datetools.py", line 78, in _date_from_idx
    return d1 + idx * _freq_to_pandas[freq]
  File "/usr/lib/pymodules/python2.7/pandas/tseries/offsets.py", line 193, in __radd__
    return self.__add__(other)
  File "/usr/lib/pymodules/python2.7/pandas/tseries/offsets.py", line 188, in __add__
    return self.apply(other)
  File "/usr/lib/pymodules/python2.7/pandas/tseries/offsets.py", line 1537, in apply
    return Timestamp(result)
  File "tslib.pyx", line 153, in pandas.tslib.Timestamp.__new__ (pandas/tslib.c:5375)
  File "tslib.pyx", line 773, in pandas.tslib.convert_to_tsobject (pandas/tslib.c:13113)
  File "tslib.pyx", line 858, in pandas.tslib._check_dts_bounds (pandas/tslib.c:14219)
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2317-12-31 00:00:00

Did something change?

The text was updated successfully, but these errors were encountered:

jreback · 2013-10-24T19:04:08Z

when did this start failing?

jseabold · 2013-10-25T08:30:26Z

Somewhere between master and 0.12.0.

jreback · 2013-10-25T12:08:14Z

what I mean is I tests statsmodels 5.0 with master maybe 1 month ago and these were fine.

jseabold · 2013-10-25T12:17:06Z

None of this code has changed in statsmodels. I see these failure with 0.5.0 against pandas master.

jseabold · 2013-10-25T12:18:12Z

Something in numpy?

numpy: 1.9.0.dev-b5dab6d (/usr/local/lib/python2.7/dist-packages/numpy)

jseabold · 2013-10-25T12:18:40Z

Sorry, I've been bug hunting all week and don't have time to track this one down at the moment.

jreback · 2013-10-25T13:59:17Z

1st issue:

I think your _stata_elapsed_date_to_datetime with a format of %ty converts a number
like 2 to datetime(2,1,1), which is invalid. not sure what it is supposed to be, 2000?
either need some validation there or you can catch the exception.

the pandas converters are a little more strict on how they convert, that's why I think this is raising.

in statsmodels/iolib/foreign.py:genfromdta

1047                    col = data.columns[col]
1048                    data[col] = data[col].apply(_stata_elapsed_date_to_datetime,
1049 ->                         args=(fmtlist[i],))
1050        elif convert_dates:
1051            #date_cols = np.where(map(lambda x : x in _date_formats,
1052            #                                                    fmtlist))[0]
1053            # make the dtype for the datetime types
1054            cols = np.where(map(lambda x : x in _date_formats, fmtlist))[0]
(Pdb) p fmtlist[i]
'%ty'
(Pdb) p data[col]
0    2010
1       2
Name: yearly_date, dtype: int32

jseabold · 2013-10-25T14:05:53Z

No, this was deliberate in the test suite since this is a valid date in the stata epoch time. It used to round trip fine from this

from datetime import datetime
datetime(2, 1,1)

jreback · 2013-10-25T14:06:08Z

2nd is an out of bounds on the dates (which again is not actually checked, in 0.12 it was possible to have an out-of-bounds date slip thru). You can catch this error.

> /home/vagrant/statsmodels/statsmodels/tsa/base/tsa_model.py(105)_set_predict_start_date()
-> start, self.data.freq)
(Pdb) l
100                 return
101             if start > len(dates):
102                 raise ValueError("Start must be <= len(endog)")
103             if start == len(dates):
104                 self.data.predict_start = datetools._date_from_idx(dates[-1],
105  ->                                                     start, self.data.freq)
106             elif start < len(dates):
107                 self.data.predict_start = dates[start]
108             else:
109                 raise ValueError("Start must be <= len(dates)")
110  
(Pdb) p dates
<class 'pandas.tseries.index.DatetimeIndex'>
[1700-12-31 00:00:00, ..., 2008-12-31 00:00:00]
Length: 309, Freq: None, Timezone: None
(Pdb) d
> /home/vagrant/statsmodels/statsmodels/tsa/base/datetools.py(78)_date_from_idx()
-> return d1 + idx * _freq_to_pandas[freq]
(Pdb) d
> /usr/local/lib/python2.7/dist-packages/pandas-0.12.0_957_g8941429-py2.7-linux-i686.egg/pandas/tseries/offsets.py(193)__radd__()
-> return self.__add__(other)
(Pdb) d
> /usr/local/lib/python2.7/dist-packages/pandas-0.12.0_957_g8941429-py2.7-linux-i686.egg/pandas/tseries/offsets.py(188)__add__()
-> return self.apply(other)
(Pdb) d
> /usr/local/lib/python2.7/dist-packages/pandas-0.12.0_957_g8941429-py2.7-linux-i686.egg/pandas/tseries/offsets.py(1537)apply()
-> return Timestamp(result)
(Pdb) p result
datetime.datetime(2317, 12, 31, 0, 0)
(Pdb) Timestamp(result)
*** OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2317-12-31 00:00:00

jreback · 2013-10-25T14:21:56Z

ok....1st is a bug....have a PR to fix, easy

2nd is more troublesome, you are adding a Timestamp with an offset then yields an out-of-bounds Timestamp. We normally raise on this. You could catch it and just use it as a datetime if you want.

(Pdb) Timestamp(result)
*** OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2317-12-31 00:00:00
(Pdb) p result
datetime.datetime(2317, 12, 31, 0, 0)

jseabold · 2013-10-25T14:26:09Z

I'm trying to figure out if this second test was always broken. AFAIK, this is not the expected result and what it used to do, but I'm not certain yet. I need to build everything in a virtualenv to test.

jreback · 2013-10-25T14:44:10Z

ok....going to merge the first fix, but leave this issue open...lmk

jseabold · 2013-10-25T14:53:07Z

Yeah, this one used to work too.

>>> import pandas as pd
>>> from datetime import datetime
>>> from statsmodels.tsa.base.datetools import _freq_to_pandas
>>> pd.Timestamp(datetime(2008, 12, 31)) + 309*_freq_to_pandas['A']
datetime.datetime(2317, 12, 31, 0, 0)

jseabold · 2013-10-25T14:55:07Z

>>> pd.Timestamp(datetime(2008, 12, 31)) + 309*pd.offsets.YearEnd()
datetime.datetime(2317, 12, 31, 0, 0)

jreback · 2013-10-25T15:20:03Z

@jtratner, cc @Cancan01

pls take a look here...the reason this breaks is I put in a change to wrap the returns the applying an offset in a Timestamp, I think @Cancan01 was a test breaking because it was assumed it was a timestamp.

easy to fix this to have it return a timestamp and if its out-of-bounds the datetime.

any problems with that?

jtratner · 2013-10-25T16:00:31Z

Timestamp is a datetime but with additional methods, right? So, if you assume you'll get a datetime, everything you could do with that datetime you can do with the timestamp?

jreback · 2013-10-25T16:07:26Z

@jtratner that is true, I am not sure of the guarantees before, but @Cancan01 PR had a failing test that assumed it was getting a Timestamp, I think because of a repeated offset apply, e.g. something like

Timestamp + offset + other_offset, so you need for them to really return Timestamps to make it consistent. That said a Timestamp IS a datetime, except it CANNOT hold out-of-bounds data.

jtratner · 2013-10-25T16:53:24Z

EAFP - try to be nice and keep it within range and if not return a datetime and let the failure happen later - feels the same as automatically converting an integer column to float if you add nan or add 0.1 to it, or converting indexes on slicing, etc.

jreback mentioned this issue Oct 25, 2013

BUG: when trying to use an out-of-bounds date as an object dtype (GH5312) #5322

Merged

jreback mentioned this issue Oct 25, 2013

TST/BUG: allow invalid Timestamps to pass thru as datetimes when operating with offsets #5327

Merged

jreback closed this as completed in #5327 Oct 26, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing statsmodels tests on pandas master vs. 0.12.0 #5312

Failing statsmodels tests on pandas master vs. 0.12.0 #5312

jseabold commented Oct 24, 2013

jreback commented Oct 24, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jseabold commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jtratner commented Oct 25, 2013

jreback commented Oct 25, 2013

jtratner commented Oct 25, 2013

Failing statsmodels tests on pandas master vs. 0.12.0 #5312

Failing statsmodels tests on pandas master vs. 0.12.0 #5312

Comments

jseabold commented Oct 24, 2013

jreback commented Oct 24, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jseabold commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jseabold commented Oct 25, 2013

jseabold commented Oct 25, 2013

jreback commented Oct 25, 2013

jtratner commented Oct 25, 2013

jreback commented Oct 25, 2013

jtratner commented Oct 25, 2013