read_csv interprets index column as dates #225

takluyver · 2011-10-12T14:19:49Z

I loaded a dataset with an index column going from 1 to 304. Then I try to get the repr of the dataframe, but it raises ValueError: year=100 is before 1900; the datetime strftime() methods require year >= 1900. Looking at df.index shows that the numeric indices have been transformed into datetimes (the first 31 to days of this month, 32-60 as years 20xx, 61-99 as years 19xx, and higher numbers as a raw year).

The text was updated successfully, but these errors were encountered:

wesm · 2011-10-12T14:31:22Z

Hm, this is caused by a bit of legacy cruft. Perhaps it's time to add a parse_dates=True option to read_csv/read_table and have it default to False. Currently it attempts to parse the index column expecting time series data but it's apparently a bit too aggressive.

wesm · 2011-10-12T14:50:58Z

Alright, I'm just going to do this. I also am going to change the default for index_col to None-- I don't like breaking APIs but in retrospect index_col=0 was the wrong default. In certain cases (there the number of names in the header is 1 less than the number of columns of data) it should be able to auto-infer that the index_col=0, however (there is a test for this, I think)

takluyver · 2011-10-12T14:54:46Z

Oh, you're already addressing the next issue I filed. Kudos for responding so quickly!

…226

wesm · 2011-10-12T18:54:16Z

Again, tragic to break APIs but changing to parse_dates=False and index_col=None by default makes the most sense. I'll make sure to sound the alarm at next release time so that users review their usage of read_csv / read_table. Could use more unit testing on this-- also need to update the sphinx docs to reflect the new defaults and to document usage with MultiIndex (e.g. index_col=[0, 1])-- if you get some energy feel free to do so. I'll create a separate ticket about updating the docs

takluyver · 2011-10-12T19:16:35Z

Sure, I'll see what I can do.

Presumably this makes the next release 0.5, rather than 0.4.4, since APIs are changing.

wesm · 2011-10-12T20:21:48Z

That seems like a good idea. If any urgent bug fixes come along I can create a 0.4.x maintenance branch for cherry-picking, otherwise, I suppose I can target end of month or early November for a 0.5 release. Lot of stuff I want to do before then.

wesm added a commit that referenced this issue Oct 12, 2011

ENH: parser API changes, added parse_dates options, address GH #225, #…

0cc5616

…226

wesm closed this as completed Oct 12, 2011

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019

Add rename support for Libraries in Arctic (pandas-dev#225)

e963468

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv interprets index column as dates #225

read_csv interprets index column as dates #225

takluyver commented Oct 12, 2011

wesm commented Oct 12, 2011

wesm commented Oct 12, 2011

takluyver commented Oct 12, 2011

wesm commented Oct 12, 2011

takluyver commented Oct 12, 2011

wesm commented Oct 12, 2011

read_csv interprets index column as dates #225

read_csv interprets index column as dates #225

Comments

takluyver commented Oct 12, 2011

wesm commented Oct 12, 2011

wesm commented Oct 12, 2011

takluyver commented Oct 12, 2011

wesm commented Oct 12, 2011

takluyver commented Oct 12, 2011

wesm commented Oct 12, 2011