Skip to content

BUG: pandas.Series.dt.round inconsistent behaviour on NAT's with different arguments? #14940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rizac opened this issue Dec 21, 2016 · 3 comments
Labels
Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@rizac
Copy link

rizac commented Dec 21, 2016

Code Sample

# Your code here
import pandas as pd
from datetime import datetime
d=pd.DataFrame(data=[[datetime(2010,1,1,23,14,12,599), 2], [None, 4]], columns=['dtime', 'int'])
print d['dtime'].dt.round('s')
print d['dtime'].dt.round('5s')
print d['dtime'].dt.round('min')

Problem description

This is the output I get:

0   2010-01-01 23:14:12
1                   NaT
Name: dtime, dtype: datetime64[ns]

0   2010-01-01 23:14:10.000000000
1   2262-04-10 00:12:44.999999488
Name: dtime, dtype: datetime64[ns]

0   2010-01-01 23:14:00
1   2262-04-10 00:13:00
Name: dtime, dtype: datetime64[ns]

In the first case (freq argument 's') NaT are preserved (I would say, as I expect). However, in the second and third case, NaT's are converted to some apparently weird date time. If I don't miss some particular information (which in case after I googled and browsed the docs shouldn't be that hidden), this seems to be a bug

Expected Output

0 2010-01-01 23:14:12
1 NaT
Name: dtime, dtype: datetime64[ns]

0 2010-01-01 23:14:10.000000000
1 NaT
Name: dtime, dtype: datetime64[ns]

0 2010-01-01 23:14:00
1 NaT
Name: dtime, dtype: datetime64[ns]

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 9.0.1
setuptools: 24.0.2
Cython: None
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

@rizac rizac changed the title pandas.Series.dt.round inconsistent behaviour on different frequencies pandas.Series.dt.round inconsistent behaviour on NAT's with different freq arguments Dec 21, 2016
@rizac rizac changed the title pandas.Series.dt.round inconsistent behaviour on NAT's with different freq arguments pandas.Series.dt.round inconsistent behaviour on NAT's with different freq arguments? Dec 21, 2016
@rizac rizac changed the title pandas.Series.dt.round inconsistent behaviour on NAT's with different freq arguments? pandas.Series.dt.round inconsistent behaviour on NAT's with different arguments? Dec 21, 2016
@jreback
Copy link
Contributor

jreback commented Dec 21, 2016

yep, this looks buggy. need to mask the NaT's before and replace after, use self.hasnan (this is all on DatetimeIndex), though TimedeltaIndex and PeriodIndex need checking as well.

Further looks like Timestamp doesn't implement these (should return a NaT), so need to modify NaTType.

In [6]: d['dtime'][1].round('s')
AttributeError: 'NaTType' object has no attribute 'round'

In [7]: d['dtime'][1].round('5s')
AttributeError: 'NaTType' object has no attribute 'round'

prob also buggy for .floor and .ceil

PR's welcome!

@jreback jreback added Bug Difficulty Intermediate Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Dec 21, 2016
@jreback jreback added this to the Next Major Release milestone Dec 21, 2016
@jreback jreback changed the title pandas.Series.dt.round inconsistent behaviour on NAT's with different arguments? BUG: pandas.Series.dt.round inconsistent behaviour on NAT's with different arguments? Dec 21, 2016
@discort
Copy link
Contributor

discort commented Jan 13, 2017

I was passing the example code though debugger and investigated the next. The _round method

    def _round(self, freq, rounder):

        from pandas.tseries.frequencies import to_offset
        unit = to_offset(freq).nanos

        # round the local times
        values = _ensure_datetimelike_to_i8(self)

        result = (unit * rounder(values / float(unit))).astype('i8')
        attribs = self._get_attributes_dict()
        if 'freq' in attribs:
            attribs['freq'] = None
        if 'tz' in attribs:
            attribs['tz'] = None
        return self._ensure_localized(
            self._shallow_copy(result, **attribs))

    @Appender(_round_doc % "round")
    def round(self, freq, *args, **kwargs):
        return self._round(freq, np.round)

Lets consider the first example d['dtime'].dt.round('s')

>>> self
DatetimeIndex(['2010-01-01 23:14:12.000599', 'NaT'], dtype='datetime64[ns]', name='dtime', freq=None)

unit variable equals 1000000000 and result=array([ 1262387652000000000, -9223372036854775808]). We are interested in the second item 'NaT'.
At the second, d['dtime'].dt.round('5s') unit variable equals 5000000000 and result=array([ 1262387650000000000, -9223372035000000512]).
We can see that in both examples values which before this were Nat are different.

Deeper investigation gave next results

  1. 1st example
>>> values[1] / unit
-9223372036.8547764
>>> rounder(values[1] / unit)
-9223372037.0
  1. 2nd example
>>> values[1] / unit
-1844674407.3709552
>>> rounder(values[1] / unit)
-1844674407.0

In the second case, the value isn't rounding which is correct cause we use math round. But I think we definitely use much smarter round, because we are using the large numbers and could neglect by math round.
I tried to round using np.floor and got the needed result array([ 1262387645000000000, -9223372036854775808]).

@jreback
Copy link
Contributor

jreback commented Jan 13, 2017

@discort you don't need to worry about the calculation on the NaT at all. simply use self._maybe_mask_results(result) which will make the location of where the NaT were back into NaT's

@jreback jreback modified the milestones: 0.20.0, Next Major Release Jan 14, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
…erent arguments

closes pandas-dev#14940

Author: discort <[email protected]>

Closes pandas-dev#15124 from discort/dt_round_bug_14940 and squashes the following commits:

9e77191 [discort] added a test for Timestamp
52c897a [discort] BUG: added maybe_mask_results to '_round' method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants