-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Allow DataFrame.to_csv to represent inf and nan differently #2026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agreed-- this has been a long oversight. See also #1919 |
I have a few hours to work on this. How do you recommend I proceed? I could make changes to to_csv / read_csv only, or I could work on it in a way more relevant to #1919. Is the isnull method mentioned there the place to start? |
Here is a start, I'm sure it is not the way you want to do things eventually, but maybe it will spark enough feedback to get things usable: It has a test that fails because from_csv does not work on a column of all infs (something to do with int vs float types, I believe). |
If you're up to tackling #1919, my plan was to make the Alternately you could do something simpler to just address this issue |
Here is enough to get the infs to read and write successfully: aflaxman@d11eea2 I'll investigate the additional work necessary for #1919 before making a pull request... the interface for writing infs and nans I've implemented could be improved greatly by this. |
Here is a version that treats nan and inf differently, which simplifies writing csvs with infs: aflaxman@fe12c8a It does not have a inf-is-nan option yet. I'm not sure what a good way to do that it. |
This is all set. Parsing working fine now too |
Thanks, I've confirmed that this is working for me, too. Unfortunately, I had to change a line in tslib.pdx to get it to build. I will comment about it on that commit, if that is the appropriate protocol. |
Sure please do-- you might have run up against a bug that was fixed in Cython 0.17.x |
I would like the DataFrame.to_csv method to have an option to store inf values as "inf" and nan values as "nan". This will be important for computations that use inf and nan differently.
Currently:
In [72]: df = pd.DataFrame({'A': [inf]})
In [73]: print isinf(df)
A
0 True
In [74]: df.to_csv('temp.csv')
In [75]: df2 = pd.read_csv('temp.csv', index_col=0)
In [76]: print isinf(df2)
A
0 False
In the new implementation (possibly with an option passed to df.to_csv to select it), the result of isinf(df2) would be True.
The text was updated successfully, but these errors were encountered: