dt.total_seconds() stores float with appending 0.00000000000001 #34290

sjvdm · 2020-05-21T10:28:16Z

Simple test to show that if I have two datetime columns and use dt.total_seconds() to calc the difference, values are stored with an offset of 0.00000000000001.

print(pd.__version__)
print(pd.show_versions())
1.0.3

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.8.0.final.0
python-bits      : 64
OS               : Linux
OS-release       : 3.10.0-1062.18.1.el7.x86_64
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.3
numpy            : 1.18.3
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 41.4.0
Cython           : 0.29.16
pytest           : 5.4.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.13.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.2.1
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.4.1
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : 1.3.16
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

iPython code:

import pandas as pd
import datetime

data = {'start':datetime.datetime(2020,1,1,12),
       'end':datetime.datetime(2020,1,1,12,2) 
       }

df = pd.DataFrame(data,index=[0])

#try to parse to pd.to_datetime
df['end'] = pd.to_datetime(df['end'])
df['start'] = pd.to_datetime(df['start'])

print('print calc differences')
print((df['end'] - df['start']).dt.total_seconds())
print((df['end'] - df['start']).dt.total_seconds().values[0])
print('')

print('testing on normal float')
print(float(120))
print('')

Output:

print calc differences
0    120.0
dtype: float64
120.00000000000001

testing on normal float
120.0

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2020-05-22T01:14:36Z

I cant reproduce this on OSX (py37) or Ubuntu (py36). @sjvdm can you get this in any other python versions?

sjvdm · 2020-05-22T07:58:36Z

Ok so I reproduced this at 2 other versions:

2.7 (see code below). Pasted the code just as is into the console (I just updated to also import datetime).
3.8.2 (via this online editor - https://repl.it/repls/KnownConcernedInsurance)

The result of total_seconds() seems to be stored as 120.00000000000001

Python 2.7.5 (default, Apr  2 2020, 13:16:51) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
import datetime

data = {'start':datetime.datetime(2020,1,1,12),
       'end':datetime.datetime(2020,1,1,12,2) 
       }

df = pd.DataFrame(data,index=[0])

#try to parse to pd.to_datetime
df['end'] = pd.to_datetime(df['end'])
df['start'] = pd.to_datetime(df['start'])

print('print calc differences')
print((df['end'] - df['start']).dt.total_seconds())
print((df['end'] - df['start']).dt.total_seconds().values[0])
print('')

print('testing on normal float')
print(float(120))
>>> import datetime
>>> 
>>> data = {'start':datetime.datetime(2020,1,1,12),
...        'end':datetime.datetime(2020,1,1,12,2) 
...        }
>>> 
>>> df = pd.DataFrame(data,index=[0])
>>> 
>>> #try to parse to pd.to_datetime
... df['end'] = pd.to_datetime(df['end'])
>>> df['start'] = pd.to_datetime(df['start'])
>>> 
>>> print('print calc differences')
print calc differences
>>> print((df['end'] - df['start']).dt.total_seconds())
print('')0    120.0
dtype: float64
>>> print((df['end'] - df['start']).dt.total_seconds().values[0])
120.00000000000001
>>> print('')

>>> 
>>> print('testing on normal float')
testing on normal float
>>> print(float(120))
120.0
>>> print('')

sjvdm · 2020-05-22T08:10:24Z

May also be related to floating point precision as a colleague pointed out: https://www.python.org/dev/peps/pep-0485/

jorisvandenbossche · 2020-05-22T15:05:37Z

I can "reproduce" this (on linux). Using a simpler example, with Timedelta scalar it doesn't give the issue, with TimedeltaArray ti does:

In [27]: td = pd.Timedelta("2 min")  

In [28]: td.total_seconds()  
Out[28]: 120.0

In [29]: print(td.total_seconds()) 
120.0

In [30]: arr = pd.array([pd.Timedelta("2 min")])  

In [31]: arr.total_seconds()   
Out[31]: array([120.])

In [32]: print(arr.total_seconds()[0])
120.00000000000001

But, it's quite probably indeed a floating point precision issue that can be ignored. total_seconds is implemented like:

In [33]: 1e-9 * arr.asi8   
Out[33]: array([120.])

In [34]: (1e-9 * arr.asi8)[0] 
Out[34]: 120.00000000000001

jorisvandenbossche · 2020-05-22T15:07:05Z

The Timedelta scalar version (inherited from datetime.timedelta is implemented with a division instead of multiplicatio. That seems to make the difference:

# how it is implemented for Timedelta under the hood
In [43]: td.value / 1e9  
Out[43]: 120.0

# what basically happens in TimedeltaArray
In [44]: td.value * 1e-9 
Out[44]: 120.00000000000001

fpavogt · 2022-07-12T14:18:51Z

Looks like I just hit the same issue with pandas 1.4.3 under OSX 12.4 and Anaconda. Below a MWE adapted from the case that got me to realize this.

Looks like this issue is still relevant !

Sample code:

import pandas as pd

out = pd.Series([Timestamp('2022-03-16 08:32:26'), Timestamp('2022-03-16 08:32:41')])

# Apply dt.total_seconds, and extract the second value
(out - out.iloc[0]).dt.total_seconds()[1]
# Returns 15.000000000000002

# Extract the Timedelta, and apply its total_seconds() method 
((out - out.iloc[0])[1]).total_seconds()
# Returns 15.0

# Same workaround, but via apply
(out - out.iloc[0]).apply(lambda x: x.total_seconds())[1]
# Returns 15.0

jbrockmendel · 2022-07-12T15:14:26Z

I get 15.0 in main. IIRC there was a PR last week that touched DatetimeArray.total_seconds that might have fixed this

mroeschke · 2022-07-12T17:11:02Z

I guess could use a unit test to confirm.

mroeschke added the Needs Info Clarification about behavior needed to assess issue label May 22, 2020

gimseng mentioned this issue Jul 9, 2020

BUG: datetime total_seconds inaccurate #35158

Closed

3 tasks

mroeschke added Bug Timedelta Timedelta data type and removed Needs Info Clarification about behavior needed to assess issue labels Mar 26, 2021

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 12, 2022

ntachukwu mentioned this issue Aug 23, 2022

TST: dt.total_seconds() stores float with appending 0.00000000000001 fix #48218

Merged

3 tasks

mroeschke closed this as completed in #48218 Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dt.total_seconds() stores float with appending 0.00000000000001 #34290

dt.total_seconds() stores float with appending 0.00000000000001 #34290

sjvdm commented May 21, 2020 •

edited

Loading

jbrockmendel commented May 22, 2020

sjvdm commented May 22, 2020

sjvdm commented May 22, 2020

jorisvandenbossche commented May 22, 2020

jorisvandenbossche commented May 22, 2020

fpavogt commented Jul 12, 2022 •

edited

Loading

jbrockmendel commented Jul 12, 2022

mroeschke commented Jul 12, 2022

dt.total_seconds() stores float with appending 0.00000000000001 #34290

dt.total_seconds() stores float with appending 0.00000000000001 #34290

Comments

sjvdm commented May 21, 2020 • edited Loading

jbrockmendel commented May 22, 2020

sjvdm commented May 22, 2020

sjvdm commented May 22, 2020

jorisvandenbossche commented May 22, 2020

jorisvandenbossche commented May 22, 2020

fpavogt commented Jul 12, 2022 • edited Loading

jbrockmendel commented Jul 12, 2022

mroeschke commented Jul 12, 2022

sjvdm commented May 21, 2020 •

edited

Loading

fpavogt commented Jul 12, 2022 •

edited

Loading