-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.to_csv/read_csv inconsistency #4595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wonder if it would be reasonable to try to convert dates automatically and leave as a string if that fails. It also might be reasonable to set These are both backwards incompatible API changes though, and over here we really try hard to be backwards-compatible. @gregstg Also realize that with That said, we do aim to please our users as much as possible. |
Firstly, I can't reproduce this with padnas 0.12(-ish):
Perhaps you're using an earlier version? Secondly, dataframes are "invariant" when roundtrip-ed, but that's not what Finally, note that |
@y-p in your example, the second |
but it still works: In [63]: import pandas.io.data as web
...: df = web.get_data_yahoo('IBM', '1/1/2000', '1/1/2010')
...: print type(df.index),df.columns
...: df.to_csv('1.csv')
...: new_df = df.from_csv('1.csv')
...: print type(new_df.index),new_df.columns
<class 'pandas.tseries.index.DatetimeIndex'> Index([u'Open', u'High', u'Low', u'Close', u'Volume', u'Adj Close'], dtype=object)
<class 'pandas.tseries.index.DatetimeIndex'> Index([u'Open', u'High', u'Low', u'Close', u'Volume', u'Adj Close'], dtype=object) |
I checked things in the console first, then misedited the snippet manually. my bad. |
@y-p Why the close? I think the OP has a point w.r.t. datetimes. Should I open another issue for that? Or do you not agree? |
What point is that? I don't see anything remaining. It's not even reproducible with the But by all means, reopen if you want to. |
@cpcloud note that if you load the dataset from csv, it comes back as a datetime index (maybe it doesn't on older versions of pandas) so the specific problem OP brought up has been resolved or is due to some kind of error. |
@jtratner No, it doesn't, which is the point I'm talking about. I'm aware of the
|
@y-p For the record, i'm also against changing The previous comment does show an inconsistency with
|
Was |
IIUC from_csv is supposed to be the inverse of to_csv (and to accomplish this it does call read_csv with a couple of arguments) however, this is quite confusing and there is an issue to deprecate from_csv entirely and as @y-p points outs the csv format is not an invariant format (at least if you don't want to tag it with extra meta deta which makes it pretty non-generic) so what I think we should do is this
and either
or
|
+1 for a doc note here |
Both I would suggest just leaving things as they are. a doc note doesn't hurt (and doesn't |
Folks: Someone said that the phenomenon I mentioned is not reproducible in the In[5]: pd.version In[6]: all_data ={} In[7]: for ticker in ['AAPL', 'IBM', 'MSFT', 'GOOG']: In[8]: price = DataFrame({tic: data['Adj Close'] for tic, data in In[9]: price.index[0:3] In[10]: price.to_csv('price2.csv') In[11]: price2 = pd.read_csv('price2.csv') In[12]: price2.index[0:3] This is a paste from Canopy, where I have entered the 'In' statements by This example is basically from p 139 of Wes McKinney's text. The point is, It was pointed out, and I am grateful for this, that I should have used Respectfully, Greg St. George On Sun, Aug 18, 2013 at 10:10 AM, y-p [email protected] wrote:
|
Greg - sorry about that, I guess I was wrong about what was going on! I |
Jeff, Not at all, no offense taken. Greg On Sun, Aug 18, 2013 at 3:23 PM, Jeff Tratner [email protected]:
|
I would just like to note this is still an issue in 2018. Is there a work around other then formatting the to_csv. |
And I would like to note that csv is not a format meant for perfect roundtripping as you loose by definition type information. If you don't want to do manual formatting on reading in, there are plenty of methods that try to do better roundtripping (parquet, feather, json table schema, ..) |
This is not a bug; maybe it is more of a philosophical issue.
When I download data from Yahoo, mimicking the code on p. 139 of the book (python for data analysis) it creates a data frame. The index for this data frame is a time series index - timestamps provided by Yahoo.
If I save this DataFrame to disk using to_csv and then read it back using read_csv the resulting data frame now has a 'normal' range type index [0,1,2...n] and a new column labelled 'Date' which now contains the dates.
This means that code written to analyze this data needs to be different depending on whether it has been saved or not. I can deal with it as is, but I think this is not the best design; DataFrames should be invariant under saving.
Just my $ 0.02.
Respectfully,
Greg St. George
The text was updated successfully, but these errors were encountered: