Skip to content

Incorrect dataframe serialisation if you have multiple columns starting with digits and the first alphabetically sorted column has a NaN value #485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fdorssers opened this issue Aug 8, 2022 · 0 comments · Fixed by #486
Milestone

Comments

@fdorssers
Copy link
Contributor

fdorssers commented Aug 8, 2022

Steps to reproduce:
I've made a small Python script that showcases the issue:

import os
import pandas as pd
import numpy as np
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import PointSettings
from influxdb_client.client.write.dataframe_serializer import data_frame_to_list_of_points

conn = InfluxDBClient(
    url="http://localhost:8086",
    token="<token>",
    org="<org>",
)
write_api = conn.write_api()
df = pd.DataFrame(
    index=[pd.Timestamp("2022-07-29 00:01:00", tz="UTC")],
    data={
        "1_col": np.nan,
        "2_col": 1.1,
        "a_col": 2.2
    }
)
# The batch item wasn't processed successfully because: (400)
# Reason: Bad Request
# HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json; charset=utf-8', 'X-Influxdb-Build': 'OSS', 'X-Influxdb-Version': 'v2.3.0+SNAPSHOT.090f681737', 'X-Platform-Error-Code': 'invalid', 'Date': 'Mon, 08 Aug 2022 07:48:43 GMT', 'Content-Length': '128'})
# HTTP response body: {"code":"invalid","message":"unable to parse 'test_measurement ,2_col=1.1,a_col=2.2 1659052860000000000': invalid field format"}

data_frame = pd.DataFrame(data={
    '1value': [np.nan],
    'avalue': [  30.0],
    'bvalue': [  30.0]
}, index=pd.period_range('2020-05-24 10:00', freq='H', periods=1))

points = data_frame_to_list_of_points(data_frame,
                                      PointSettings(),
                                      data_frame_measurement_name='test')
# ['test avalue=30.0,bvalue=30.0 1590314400000000000'] ✅

data_frame = pd.DataFrame(data={
    '1value': [np.nan,   30.0, np.nan,   30.0, np.nan],
    '2value': [  30.0, np.nan, np.nan, np.nan, np.nan],
    '3value': [  30.0,   30.0,   30.0, np.nan, np.nan],
    'avalue': [  30.0,   30.0,   30.0,   30.0,   30.0]
}, index=pd.period_range('2020-05-24 10:00', freq='H', periods=5))

points = data_frame_to_list_of_points(data_frame,
                                      PointSettings(),
                                      data_frame_measurement_name='test')
# ['test ,2value=30.0,3value=30.0,avalue=30.0 1590314400000000000', ❌
#  'test 1value=30.0,3value=30.0,avalue=30.0 1590318000000000000',  ✅
#  'test ,3value=30.0,avalue=30.0 1590321600000000000',             ❌
#  'test 1value=30.0,avalue=30.0 1590325200000000000',              ✅
#  'test avalue=30.0 1590328800000000000']                          ✅

Expected behavior:
Data should be stored in InfluxDB if I try to save a DataFrame that has multiple columns starting with digits where NaNs might occur.

Actual behavior:
If all columns are sorted alphabetically, and the first column starts with a digit and has a NaN then it will crash if the first subsequent column that has a value also starts with a digit. If the first subsequent column that follows that has a value starts with a normal character, then it works as usual.

Time 1_col 2_col 3_col a_col Result
... NaN 30.0 30.0 30.0 Fails
... 30.0 NaN 30.0 30.0 Works
... NaN NaN 30.0 30.0 Fails
... 30.0 NaN NaN 30.0 Works
... NaN NaN NaN 30.0 Works

However, if your first column starts with a digit, and this is the only column with a digit, then there's no problem.

Time 1_col a_col Result
... NaN 30.0 Works

Other:
I've also immediately made a PR: #486 . Wasn't really sure whether it was enough to just make a PR or if an issue was required, so just made both.

Specifications:

  • Client Version: 1.31.0
  • InfluxDB Version: 2.3.0 (docker)
  • Platform: Intel MacBook Pro running macOS 12.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants