Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

DataFrame write "nan", "inf" error in influxdb #422

Closed
shagru opened this issue Mar 7, 2017 · 6 comments · Fixed by #812
Closed

DataFrame write "nan", "inf" error in influxdb #422

shagru opened this issue Mar 7, 2017 · 6 comments · Fixed by #812

Comments

@shagru
Copy link

shagru commented Mar 7, 2017

see here #195
It seems that in the latest version, if a DataFrame contains nan, inf elements, there will be error when writing to influxdb. The problem is that the line protocol doesn't support "value1=nan" or "value1=inf". You can drop rows containing nan using df.dropna(), but in most of the case, in the same row, you will need to keep those values that are not "nan", so droping an entire row is not ideal. The solution can be to omit the value fields that is "nan" and "inf" (np.isfinite()), something like, changing "value1=0.3, value2=nan, value3=inf" to "value1=0.3". In this way, the influxdb will keep value1, and give value2, and value3 empty values.

@shagru
Copy link
Author

shagru commented Mar 7, 2017

One workaround for now is to write dataframe to influxdb column-by-column. For each column, dropna().

df = df.replace(['inf', '-inf'], np.nan)
for c in df.columns:
   client.write_points(df[[c]].dropna(), 'measurement')

nmerket pushed a commit to nmerket/influxdb-python that referenced this issue Apr 10, 2017
@jackzampolin
Copy link
Contributor

@patrickhoebeke
Copy link
Contributor

see my proposed fix in pull request #507

@tux-00
Copy link

tux-00 commented Dec 15, 2017

The solution can be to omit the value fields that is "nan" and "inf" (np.isfinite()), something like, changing "value1=0.3, value2=nan, value3=inf" to "value1=0.3".

I agree with this.

It seems that in the latest version, if a DataFrame contains nan, inf elements, there will be error when writing to influxdb.

@shagru are you talking about the latest version of InfluxDB or influxdb-python ?

@patrickhoebeke
Copy link
Contributor

Indeed, we should reject nan, None and inf values. My original proposal was only taking into account nan and None values (detected using pd.isnull()). We should indeed also exclude np.inf .
We could either:

  • use np.isinf
    or
  • use context :
    with pd.option_context('mode.use_inf_as_null', True):
    df = df.dropna(subset=['col1', 'col2'], how='all')

Do you want me to update my pull request or create a new one from the latest version of the repo ?

@AndyMender
Copy link

AndyMender commented Jan 25, 2018

I noticed the DataFrameClient class implements an ignore_nan keyword argument, which by default is set to True and pytest tests for nan conditions (either entire lines of a dataframe or individual column values) seem to confirm that this is working. However, I still have the above mentioned issue with writing dataframes containing nan values.

InfluxDB server version: 1.4.2
InfluxDB Python client version: 5.0.0

EDIT: Apparently, ignore_nan seems to be present only in the InfluxDB v0.8 leftover compatibility code and the most recent version of the InfluxDB Python client (git pull from the master branch) handles dataframes with nan values properly.

Could the package in PIP be updated accordingly? :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants