Skip to content

read_csv interprets index column as dates #225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
takluyver opened this issue Oct 12, 2011 · 6 comments
Closed

read_csv interprets index column as dates #225

takluyver opened this issue Oct 12, 2011 · 6 comments
Labels
Milestone

Comments

@takluyver
Copy link
Contributor

I loaded a dataset with an index column going from 1 to 304. Then I try to get the repr of the dataframe, but it raises ValueError: year=100 is before 1900; the datetime strftime() methods require year >= 1900. Looking at df.index shows that the numeric indices have been transformed into datetimes (the first 31 to days of this month, 32-60 as years 20xx, 61-99 as years 19xx, and higher numbers as a raw year).

@wesm
Copy link
Member

wesm commented Oct 12, 2011

Hm, this is caused by a bit of legacy cruft. Perhaps it's time to add a parse_dates=True option to read_csv/read_table and have it default to False. Currently it attempts to parse the index column expecting time series data but it's apparently a bit too aggressive.

@wesm
Copy link
Member

wesm commented Oct 12, 2011

Alright, I'm just going to do this. I also am going to change the default for index_col to None-- I don't like breaking APIs but in retrospect index_col=0 was the wrong default. In certain cases (there the number of names in the header is 1 less than the number of columns of data) it should be able to auto-infer that the index_col=0, however (there is a test for this, I think)

@takluyver
Copy link
Contributor Author

Oh, you're already addressing the next issue I filed. Kudos for responding so quickly!

@wesm
Copy link
Member

wesm commented Oct 12, 2011

Again, tragic to break APIs but changing to parse_dates=False and index_col=None by default makes the most sense. I'll make sure to sound the alarm at next release time so that users review their usage of read_csv / read_table. Could use more unit testing on this-- also need to update the sphinx docs to reflect the new defaults and to document usage with MultiIndex (e.g. index_col=[0, 1])-- if you get some energy feel free to do so. I'll create a separate ticket about updating the docs

@wesm wesm closed this as completed Oct 12, 2011
@takluyver
Copy link
Contributor Author

Sure, I'll see what I can do.

Presumably this makes the next release 0.5, rather than 0.4.4, since APIs are changing.

@wesm
Copy link
Member

wesm commented Oct 12, 2011

That seems like a good idea. If any urgent bug fixes come along I can create a 0.4.x maintenance branch for cherry-picking, otherwise, I suppose I can target end of month or early November for a 0.5 release. Lot of stuff I want to do before then.

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants