Skip to content

Allow DataFrame.to_csv to represent inf and nan differently #2026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aflaxman opened this issue Oct 5, 2012 · 9 comments
Closed

Allow DataFrame.to_csv to represent inf and nan differently #2026

aflaxman opened this issue Oct 5, 2012 · 9 comments
Milestone

Comments

@aflaxman
Copy link
Contributor

aflaxman commented Oct 5, 2012

I would like the DataFrame.to_csv method to have an option to store inf values as "inf" and nan values as "nan". This will be important for computations that use inf and nan differently.

Currently:
In [72]: df = pd.DataFrame({'A': [inf]})
In [73]: print isinf(df)
A
0 True
In [74]: df.to_csv('temp.csv')
In [75]: df2 = pd.read_csv('temp.csv', index_col=0)
In [76]: print isinf(df2)
A
0 False

In the new implementation (possibly with an option passed to df.to_csv to select it), the result of isinf(df2) would be True.

@wesm
Copy link
Member

wesm commented Oct 6, 2012

Agreed-- this has been a long oversight. See also #1919

@aflaxman
Copy link
Contributor Author

aflaxman commented Oct 6, 2012

I have a few hours to work on this. How do you recommend I proceed? I could make changes to to_csv / read_csv only, or I could work on it in a way more relevant to #1919. Is the isnull method mentioned there the place to start?

@aflaxman
Copy link
Contributor Author

aflaxman commented Oct 6, 2012

Here is a start, I'm sure it is not the way you want to do things eventually, but maybe it will spark enough feedback to get things usable:

aflaxman@6f31f8c

It has a test that fails because from_csv does not work on a column of all infs (something to do with int vs float types, I believe).

@wesm
Copy link
Member

wesm commented Oct 7, 2012

If you're up to tackling #1919, my plan was to make the nan-as-inf globally configurable on the path to disabling it altogether (so users could switch it on and off to see if they are relying on it somewhere). The amount of Cython code that would need to be added is very small, and pandas/core/common.py could switch between the relevant null-checking functions.

Alternately you could do something simpler to just address this issue

@aflaxman
Copy link
Contributor Author

aflaxman commented Oct 7, 2012

Here is enough to get the infs to read and write successfully: aflaxman@d11eea2

I'll investigate the additional work necessary for #1919 before making a pull request... the interface for writing infs and nans I've implemented could be improved greatly by this.

@aflaxman
Copy link
Contributor Author

aflaxman commented Oct 8, 2012

Here is a version that treats nan and inf differently, which simplifies writing csvs with infs: aflaxman@fe12c8a

It does not have a inf-is-nan option yet. I'm not sure what a good way to do that it.

This was referenced Oct 8, 2012
@wesm
Copy link
Member

wesm commented Dec 2, 2012

This is all set. Parsing working fine now too

@wesm wesm closed this as completed Dec 2, 2012
@aflaxman
Copy link
Contributor Author

aflaxman commented Dec 7, 2012

Thanks, I've confirmed that this is working for me, too. Unfortunately, I had to change a line in tslib.pdx to get it to build. I will comment about it on that commit, if that is the appropriate protocol.

@wesm
Copy link
Member

wesm commented Dec 7, 2012

Sure please do-- you might have run up against a bug that was fixed in Cython 0.17.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants