Skip to content

chore: influxdb_client/client/write: fix data_frame_to_list_of_points #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Jan 13, 2021

Fix the possibility of data corruption by using a much simpler
regular expression to fix up the results.

Also avoid mutating the DataFrame that's been passed in
by making a shallow copy. This changes semantics
so that if a column mentioned in PointSettings exists
in the actual data too, it won't be overridden.

Also change the test_write_data_frame test, which is essentially benchmarking
the data_frame_to_list_of_points function, so that it just does that
so it can easily be run on a local machine and is independent
of network speed. It runs in about 10s, which is comparable
to the previous performance.

Closes #182

  • CHANGELOG.md updated
  • Rebased/mergeable
  • A test has been added if appropriate
  • pytest tests completes successfully
  • Commit messages are in semantic format

@rogpeppe rogpeppe force-pushed the rog-002-fix-dataframe-serializer branch 3 times, most recently from 735c358 to caba68a Compare January 13, 2021 09:30
@rogpeppe rogpeppe changed the title Rog 002 fix dataframe serializer chore: influxdb_client/client/write: fix data_frame_to_list_of_points Jan 13, 2021
@rogpeppe rogpeppe force-pushed the rog-002-fix-dataframe-serializer branch 8 times, most recently from d2f85a8 to f4be9f4 Compare January 13, 2021 15:12
@codecov
Copy link

codecov bot commented Jan 13, 2021

Codecov Report

Merging #183 (51174a1) into master (91dcafb) will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #183      +/-   ##
==========================================
+ Coverage   89.72%   89.81%   +0.08%     
==========================================
  Files          26       26              
  Lines        1957     1963       +6     
==========================================
+ Hits         1756     1763       +7     
+ Misses        201      200       -1     
Impacted Files Coverage Δ
...fluxdb_client/client/write/dataframe_serializer.py 98.66% <100.00%> (+1.56%) ⬆️
influxdb_client/client/write/point.py 98.36% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91dcafb...51174a1. Read the comment docs.

print(f'measurement_name: {measurement_name}')
print(f'keys: {keys}')
print(f'tag columns: {data_frame_tag_columns}')
print(f'lambda p: f"""{{measurement_name}}{tags} {fields} {timestamp}"""')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the prints meant to stay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that's just me debugging. This PR is still a draft as yet.

@rogpeppe rogpeppe force-pushed the rog-002-fix-dataframe-serializer branch 2 times, most recently from 9da33f0 to 8f03699 Compare January 14, 2021 13:19
@rogpeppe rogpeppe marked this pull request as ready for review January 14, 2021 14:30
@rogpeppe rogpeppe force-pushed the rog-002-fix-dataframe-serializer branch from 8f03699 to 64df1c7 Compare January 14, 2021 15:39
Fix the possibility of data corruption by using a much simpler
regular expression to fix up the results.

Also avoid mutating the DataFrame that's been passed in
by making a shallow copy. This changes semantics
so that if a column mentioned in `PointSettings` exists
in the actual data too, it won't be overridden.

Also change the `test_write_data_frame` test, which is essentially benchmarking
the `data_frame_to_list_of_points` function, so that it just does that
so it can easily be run on a local machine and is independent
of network speed. It runs in about 10s, which is comparable
to the previous performance.
@rogpeppe rogpeppe force-pushed the rog-002-fix-dataframe-serializer branch from 64df1c7 to 51174a1 Compare January 14, 2021 15:53
Copy link
Contributor

@bednar bednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for awesome PR - nice code docs, perfect performance 👍

@bednar bednar merged commit 5e6569c into influxdata:master Jan 18, 2021
@bednar bednar added this to the 1.14.0 milestone Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

using Pandas DataFrame can corrupt data
3 participants