Skip to content

DOC: timeseries.rst floating point precision (#15817) #15919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mchwalisz
Copy link
Contributor

@codecov
Copy link

codecov bot commented Apr 6, 2017

Codecov Report

Merging #15919 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15919      +/-   ##
==========================================
- Coverage   90.97%   90.96%   -0.01%     
==========================================
  Files         145      145              
  Lines       49475    49475              
==========================================
- Hits        45008    45007       -1     
- Misses       4467     4468       +1
Flag Coverage Δ
#multiple 88.72% <ø> (-0.01%) ⬇️
#single 40.68% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/tseries/tools.py 85.55% <ø> (ø) ⬆️
pandas/core/common.py 90.68% <0%> (-0.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b53202...7b82e8b. Read the comment docs.

@@ -237,6 +237,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
integer or float number. This will be based off the origin.
Example, with unit='ms' and origin='unix' (the default), this
would calculate the number of milliseconds to the unix epoch start.

Warning: For float arg, precision rounding might happen. To prevent
unexpected behavior use a fixed-width exact type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true only for ns (make that clear); i would put this in the Notes section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean for unit='ns' I don't agree. You can use unit='s' and have result saved up to mili seconds with rounding error from float. as in timeseries example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. ok in any even still put it in the Notes section.

@@ -265,6 +265,22 @@ Typical epoch stored units
pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
1349720105400, 1349720105500 ], unit='ms')

Conversion of float epoch times can lead to inaccurate and unexpected results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would make this a note

also units are rounded to the unit precision so the first example is misleading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the line as note or the whole section?

also unit is taken as the base but precision is taken from float. First example is exactly the problem I've had at the beginning.

In [36]: pd.to_datetime(1490195805.433502912, unit='s')
Out[36]: Timestamp('2017-03-22 15:16:45.433503')

It neither s nor ns in the example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, that was the example, ok then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this whole section you added to make a note (or could do a warning).

@jreback jreback added the Docs label Apr 6, 2017
pd.to_datetime(1490195805.433502912, unit='s')

pd.to_datetime(1490195805433502912, unit='ns')

These *work*, but the results may be unexpected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may have to adjust this line after

@jreback jreback added the Datetime Datetime data dtype label Apr 6, 2017
@mchwalisz mchwalisz force-pushed the timeseries-precision branch from 93f8716 to cd10e92 Compare April 6, 2017 15:08
@mchwalisz
Copy link
Contributor Author

I've moved the warning in to_datetime to examples and rearranged sections in timeseries.rst to make it more logical (IMO).

@jreback jreback added this to the 0.20.0 milestone Apr 6, 2017
@jreback
Copy link
Contributor

jreback commented Apr 6, 2017

lgtm. @jorisvandenbossche

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments about the wording (to make sure everything is clear to the reader!)

.. warning::

Conversion of float epoch times can lead to inaccurate and unexpected results.
Python floats are unbounded precision, with about 15 significant digits (depends on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "unbounded precision"? that seems to contradict a bit with "15 digits"


Conversion of float epoch times can lead to inaccurate and unexpected results.
Python floats are unbounded precision, with about 15 significant digits (depends on
operating system). Rounding in conversion from to high precision ``Timestamp`` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be something wrong in the "in conversion from to high precision Timestamp" part (it doesn't read fluent to me)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flaot is missing. will correct it in a sec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls emphasize that this is only on input of very high precision float data.

@mchwalisz mchwalisz force-pushed the timeseries-precision branch from cd10e92 to e25033e Compare April 7, 2017 12:16

.. note::

Epoch times will be rounded to the nearest nanosecond.
These *work*, but the results may be unexpected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is now hanging, so add something like. numeric input is in referece to the epoch time (actually may want to move this mini-example (just this last part here) to the 'origin' section.)

@mchwalisz mchwalisz force-pushed the timeseries-precision branch from e25033e to 7b82e8b Compare April 7, 2017 12:47
@jreback jreback closed this in 860d555 Apr 8, 2017
@jreback
Copy link
Contributor

jreback commented Apr 8, 2017

thanks!

@mchwalisz mchwalisz deleted the timeseries-precision branch April 10, 2017 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_datetime() ns precision inconsistencies between s and ns
3 participants