Skip to content

BUG: in read_csv, keep_date_cols doesn't result in correct dtype #13378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Thomasillo opened this issue Jun 6, 2016 · 9 comments · Fixed by #44633
Closed

BUG: in read_csv, keep_date_cols doesn't result in correct dtype #13378

Thomasillo opened this issue Jun 6, 2016 · 9 comments · Fixed by #44633
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@Thomasillo
Copy link

Thomasillo commented Jun 6, 2016

>>> import pandas
>>> import io
>>> data = """A
20150908
20150909
"""
>>> t=pandas.read_csv(io.StringIO(data))
>>> t.dtypes
A int64
dtype: object
>>> t=pandas.read_csv(io.StringIO(data),parse_dates={'date':['A']},keep_date_col=True)
>>> t.dtypes
date datetime64[ns]
A object
dtype: object

The second time, the datatype of the column 'A' should also be int64.

@jreback
Copy link
Contributor

jreback commented Jun 6, 2016

@Thomasillo not sure the doc-string actually says that it should be parsed first. as it only applies to a list-lf-list-of dates.

Not very many tests for this either.

So not clear if this is an issue or not.

cc @gfyoung

@jreback jreback added Usage Question IO CSV read_csv, to_csv labels Jun 6, 2016
@gfyoung
Copy link
Member

gfyoung commented Jun 6, 2016

@jreback : I consider this behaviour a bug. Casting should still be done if the column is being kept because it still is a data column after. I think we'll need to remove this function here and just do the date casting first before doing other data conversions, but not sure yet.

@Thomasillo : For the time being, try using Python engine instead. It will correctly convert to int64 for the retained parsed date column.

@jreback jreback added this to the Next Major Release milestone Jun 6, 2016
@gfyoung
Copy link
Member

gfyoung commented Jun 6, 2016

Again, this issue highlights a major issue between the two engines in that the order in which things are applied do not match. As we can see here, the C engine applies the data conversions first before doing date conversions whereas the Python engine does it the other way around. Another reason why the ordering needs to be aligned so that the behaviour can be fixed.

@gfyoung
Copy link
Member

gfyoung commented Jun 6, 2016

@jreback : IMO I think that the difficulty is at least "intermediate," as this issue is another manifestation of the larger issue I described above.

@jreback jreback changed the title in read_csv, keep_date_cols doesn't result in correct dtype BUG: in read_csv, keep_date_cols doesn't result in correct dtype Jun 6, 2016
@jreback
Copy link
Contributor

jreback commented Jun 6, 2016

whomever fixes then will get 'intermediate' points (to be spent just like virtual cash:>) 😉

@Thomasillo
Copy link
Author

@gfyoung : Thanks. However, I had to solved the problem differently for now.

Am 06.06.2016 um 16:46 schrieb gfyoung [email protected]:

@jreback : I consider this behaviour a bug. Casting should still be done if the column is being kept.
@Thomasillo : For the time being, try using Python engine instead. It will correctly convert to int64 for the retained parsed date column.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@gfyoung
Copy link
Member

gfyoung commented Jun 6, 2016

@jreback : fair enough 😃

@Thomasillo : cool - thanks for bringing up the issue!

@holy-motors
Copy link

The reported bug is still present on version 0.24.1, but there is on activity here for the last three years. Shall we close it for now?

@gfyoung
Copy link
Member

gfyoung commented May 8, 2019

@holy-motors : It just means that no good solution has been found for it yet. You are more than welcome to investigate if you like!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants