-
Notifications
You must be signed in to change notification settings - Fork 524
Incorrect nanosecond timestamps being written to influxdb #649
Comments
have noted a further Issue #489 that identifies the same solution |
extensive but not fully tested, unittests updates to follow also fixes ( influxdata#527,influxdata#489, influxdata#346, influxdata#344, influxdata#340)
Same here. Using precision = 'ms' gives wrong timestamps, which has ruined a database for me due to stuff getting out of order. Floats are really not a good idea here. |
a further issue has been identified by @tpietruszka with microsecond timepoints not accurately saved when storing datapoints from a dataframe, possibly affects saving timepoints at any precision Links and further notes are on the comments of the pull request #650 (comment) |
Hi all, Thanks for the active work on this, I just discovered this issue today in a script I am writing to sync two influx nodes. Fortunately I noticed it before unleashing the script on all my databases. I tried to review the progress on this issue but I cannot really figure out why it is not merged into master yet? Does it actually take care of ns precision properly now but is not passing some build checks that look to be related to pypy? Does this mean that unless you need pypy, then all should be good? |
Hi @cuxcrider , whilst this PR now works great for ns resolution timestamps there are 2 open issues
the last I looked at it I adjusted the test cases for the issue that was identified. The modified test cases are in a branch on my fork https://github.com/auphofBSF/influxdb-python/tree/bug/microsecond_issue_tpietruszka These test case changes need to be merged into this PR and the necessary changes made to support passing these adjusted test cases. from what I recall and as described in the comment #650 (comment) we can fix this but just need to make a decision about truncation or rounding when converting between different resolutions. I will relook at this and see what can be done to get this PR fully resolved |
I have tested locally the changes to fix the precision conversions as identified by @tpietruszka , I will merge these into this PR over the next 24hrs with updated tests and try resolve all outstanding build checks for pypy. |
@auphofBSF awesome! As far as I can tell your branch is working properly for me on nanoseconds, which I guess you already know. The 'microsecond issue' seems more challenging to get it to consistently behave, but it sounds like you have it nearly working. Also, in case it helps someone, using the standard master branch influxdb python package I did in fact inject a bunch of data points that are off by a nanosecond (or a few nanoseconds?) leading to effectively duplicate datapoints in my database. As long as you do not care about having the "correct" nanosecond time of your data, you can downsample from your database and keep only one value from within an interval. I tried using influxdb aggregate functions, but that was turning into a nightmare because it renames field keys into things like "max_field_value", "distinct_field_value". So I am opting to use this python package and implement pandas resample function. i.e.: df_downsample = df.resample('S').first() #T = minutes, S= seconds, L = milliseconds I think the only "gotcha" here is that you have no way of knowing the true nanosecond timestamp of the originally recorded data. |
I came across one issue in your branch: It seems that using the dataframe client to write points will write dataframe columns that contain nans as strings. Or in other words, it won't write the data thanks to Influx's inflexibility in that regard. I saw this is well-documented here: |
Hello @cuxcrider can you please confirm the commit of the branch you are using is 7e161e2 which incorporates the precision conversion fixes merged in today. What version of Pandas, Numpy, Python and platform are you running ?
Can you also provide a snippet of code to illustrate issue. All the current functional tests are passing so this may be something not tested for. With this info will look further into the documented issues you have listed. I have scanned through the code of this PR and compared it against current master 3d61f1f and on first pass can't identify a probable cause, so will have to dig a little further |
I am now on your most up to date commit 7e161e2, and it seems to still have the issues. Here is code that runs fine on the official influxdb repo but breaks on your branch 7e161e2
|
Hello @cuxcrider , I can confirm I am getting the same issue when just using my branch with the ns fix. What is missing is all the fixes already merged into the master influxdata/influxdb-python. As my PR#650 is still failing the pypy test passing one has to merge locally this PR branch. I have setup a branch on my fork merging the primary Master with my PR. If you do that locally or pull my merge https://github.com/auphofBSF/influxdb-python/tree/test/merge_pr650issue649 and it should work. This branch passes all existing tests except the pypy. Here is the output on my test instance of your code. It would be good to get this pypy test failure sorted and then hopefully this PR can be merged. hope this works well for you
|
wow, awesome, thanks for doing that! I know next to nothing about pypy but looking for incompatibilities with Pandas seems like a reasonable place to start. In searching for "pypy pandas issues" I found this issue which has a relatively simple workaround it sounds like. https://bitbucket.org/pypy/pypy/issues/2989/pandas-currently-fails-to-install-on-pypy3 Maybe it's something easy like that? Sorry I cannot be of much help. I hope some other folks can help so your branch can be merged into the official release. I would think people must have some major headaches with duplicate data otherwise. Nice work on your patch thus far! |
Uh oh!
There was an error while loading. Please reload this page.
Writing point with time precision in ns results in incorrect nanosecond timestamps being written to influxdb
effect can be noticed in writing points from both the
DataFrameClient.writepoints
as well as theInfluxDBClient.write
This has been noted in issues: #527,#489, #346 and #344 and #340
and a proposed pull request submitted. #346 . I believe this pull request does not go far enough
#489 describes the same "floor divide" solution as part of the solution being proposed here.
This solution I am proposing in a pull request will use, floor divide and pandas.Timestamp that supports ns resolution.
Implication
In trying to use the library to update a record it writes a new record with the erroneous timestamp
Effected code regions
_dataframe_client
_convert_dataframe_to_json
_convert_dataframe_to_lines
_datetime_to_epoch
line_protocol.py
_convert_timestamp
Explanation:
the divide of an (np.int64 or int ) nanosecond timepoint by 1 (ns precision) is producing errors. it is necessary to use the floor divide operator // which yields the correct (np.int64 or int) value
If an np.int64 is divided (Operator /) by an int it yields a float64. When the initial np.int64 value is large enough the conversion back to an np.int64 looses nanosecond precision. The same error occurs with standard python int types.
Example (formulated in a ipython notebook)
Error(ns) is
######## Check using unit test timepoint ################
No Error in this
######### Suggested unit test timepoint ################
Error(ns) is
all locations where timepoints are calculated need modification to yield the expected result in nanosecond precision
the unittests do not show this up as the 2 test timepoints are small enough to not show the loss of precision
points are EPOCH '1970-01-01 00:00+00:00' EPOCH + 1 hour. These 2 test points have only microsecond resolution with no nano second component.
I propose the test case 2 to be something like EPOCH + 20000days and 23h 10m 55.123456789s. It is necessary to change all calculations based on datetime (only has microsecond resolution) to pandas Timedelta and Timestamp (these have nanosecond resolution)
I am preparing a pull request that attempts to address all timestamp calculations and fixes the unittests
Product version where issue discovered and where fixes are being tested
np.version.full_version
The text was updated successfully, but these errors were encountered: