You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the code is hard to understand. It operates by building up a string expression and then using eval, which is hard to understand, error-prone, and has potential for security flaws.
when there are null values in the data, it passes the encoded points through a regular-expression-based translation phase which can corrupt data.
the line-protocol encoding code amounts to an independent encoder to the encoder in influxdb_client/client/write/point.py, which means there's room for independent encoding bugs.
the code has undocumented (and probably unwanted side-effects) on the data frame that's being encoded. Default tags will be added directly to the data_frame object rather than being added only to the encoded line-protocol points.
Any DataFrame index values will be converted to timestamps regardless of whether that's appropriate or not. It would be better if that was something to explicitly opt into (converting the default RangeIndex index into time values is almost never going to be correct - a better default would probably to omit the time stamps and let the server add them).
Here is some example code that demonstrates corruption of data:
import pandas as pd
from influxdb_client.client.write_api import WriteOptions, WriteApi, PointSettings
from influxdb_client.client.write.point import Point
from influxdb_client.client.write.dataframe_serializer import data_frame_to_list_of_points
frame = pd.DataFrame(
data=[
["coyote_creek", 1.0, "a"],
["coyote_creek", None, "b"],
["coyote_creek", 3.0, "c"],
["coyote_creek", 4.0, "d"],
],
index=[1, 2, 3, 4],
columns=["location", "level water_level", "str"],
)
ps=data_frame_to_list_of_points(frame, PointSettings(), data_frame_measurement_name='h2o_feet', data_frame_tag_columns=['location'])
for p in ps:
print(p)
The code in dataframe_serializer.py could use some improvement.
eval
, which is hard to understand, error-prone, and has potential for security flaws.data_frame
object rather than being added only to the encoded line-protocol points.DataFrame
index values will be converted to timestamps regardless of whether that's appropriate or not. It would be better if that was something to explicitly opt into (converting the defaultRangeIndex
index into time values is almost never going to be correct - a better default would probably to omit the time stamps and let the server add them).Here is some example code that demonstrates corruption of data:
This prints:
Note that two string values are corrupted and the null value still remains because the regexp code has not taken account of the escaped key.
I would expect it to print this instead:
The text was updated successfully, but these errors were encountered: