Allow DataFrame.to_csv to represent inf and nan differently #2026

aflaxman · 2012-10-05T14:29:33Z

I would like the DataFrame.to_csv method to have an option to store inf values as "inf" and nan values as "nan". This will be important for computations that use inf and nan differently.

Currently:
In [72]: df = pd.DataFrame({'A': [inf]})
In [73]: print isinf(df)
A
0 True
In [74]: df.to_csv('temp.csv')
In [75]: df2 = pd.read_csv('temp.csv', index_col=0)
In [76]: print isinf(df2)
A
0 False

In the new implementation (possibly with an option passed to df.to_csv to select it), the result of isinf(df2) would be True.

wesm · 2012-10-06T17:23:28Z

Agreed-- this has been a long oversight. See also #1919

aflaxman · 2012-10-06T22:01:21Z

I have a few hours to work on this. How do you recommend I proceed? I could make changes to to_csv / read_csv only, or I could work on it in a way more relevant to #1919. Is the isnull method mentioned there the place to start?

aflaxman · 2012-10-06T23:41:48Z

Here is a start, I'm sure it is not the way you want to do things eventually, but maybe it will spark enough feedback to get things usable:

aflaxman@6f31f8c

It has a test that fails because from_csv does not work on a column of all infs (something to do with int vs float types, I believe).

wesm · 2012-10-07T16:20:31Z

If you're up to tackling #1919, my plan was to make the nan-as-inf globally configurable on the path to disabling it altogether (so users could switch it on and off to see if they are relying on it somewhere). The amount of Cython code that would need to be added is very small, and pandas/core/common.py could switch between the relevant null-checking functions.

Alternately you could do something simpler to just address this issue

aflaxman · 2012-10-07T18:24:08Z

Here is enough to get the infs to read and write successfully: aflaxman@d11eea2

I'll investigate the additional work necessary for #1919 before making a pull request... the interface for writing infs and nans I've implemented could be improved greatly by this.

aflaxman · 2012-10-08T21:11:34Z

Here is a version that treats nan and inf differently, which simplifies writing csvs with infs: aflaxman@fe12c8a

It does not have a inf-is-nan option yet. I'm not sure what a good way to do that it.

wesm · 2012-12-02T16:55:23Z

This is all set. Parsing working fine now too

aflaxman · 2012-12-07T03:00:37Z

Thanks, I've confirmed that this is working for me, too. Unfortunately, I had to change a line in tslib.pdx to get it to build. I will comment about it on that commit, if that is the appropriate protocol.

wesm · 2012-12-07T15:16:57Z

Sure please do-- you might have run up against a bug that was fixed in Cython 0.17.x

lodagro mentioned this issue Oct 8, 2012

read_csv treats both inf and -inf as very large negative number #2041

Closed

This was referenced Oct 8, 2012

Stop treating inf/-inf as missing #1919

Closed

To csv infs #2050

Merged

wesm closed this as completed Dec 2, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow DataFrame.to_csv to represent inf and nan differently #2026

Allow DataFrame.to_csv to represent inf and nan differently #2026

aflaxman commented Oct 5, 2012

wesm commented Oct 6, 2012

aflaxman commented Oct 6, 2012

aflaxman commented Oct 6, 2012

wesm commented Oct 7, 2012

aflaxman commented Oct 7, 2012

aflaxman commented Oct 8, 2012

wesm commented Dec 2, 2012

aflaxman commented Dec 7, 2012

wesm commented Dec 7, 2012

Allow DataFrame.to_csv to represent inf and nan differently #2026

Allow DataFrame.to_csv to represent inf and nan differently #2026

Comments

aflaxman commented Oct 5, 2012

wesm commented Oct 6, 2012

aflaxman commented Oct 6, 2012

aflaxman commented Oct 6, 2012

wesm commented Oct 7, 2012

aflaxman commented Oct 7, 2012

aflaxman commented Oct 8, 2012

wesm commented Dec 2, 2012

aflaxman commented Dec 7, 2012

wesm commented Dec 7, 2012