Skip to content

string to date format ignored on apply #3669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue May 21, 2013 · 7 comments
Closed

string to date format ignored on apply #3669

hayd opened this issue May 21, 2013 · 7 comments
Labels
Bug IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented May 21, 2013

From the So question.

I think apply be passing on the format keyword argument:

In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])

In [2]: s.apply(pd.to_datetime, format='%d/%m/%Y')
Out[2]:
0   2012-12-01 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]

In [3]: pd.to_datetime(s, format='%d/%m/%Y')
Out[3]:
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]
@jorisvandenbossche
Copy link
Member

Seems only to be the case for series, not for dataframes:

>>> import pandas as pd
>>> pd.__version__
'0.11.0'
>>> s = pd.Series(['12/1/2012', '30/01/2012'])
>>> s.apply(pd.to_datetime, format='%d/%m/%Y')
0   2012-12-01 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]
>>> df = pd.DataFrame(s)
>>> df
            0
0   12/1/2012
1  30/01/2012
>>> df.apply(pd.to_datetime, format='%d/%m/%Y')
                    0
0 2012-01-12 00:00:00
1 2012-01-30 00:00:00
>>> df[0].apply(pd.to_datetime, format='%d/%m/%Y')
0   2012-12-01 00:00:00
1   2012-01-30 00:00:00
Name: 0, dtype: datetime64[ns]
>>> df[[0]].apply(pd.to_datetime, format='%d/%m/%Y')
                    0
0 2012-01-12 00:00:00
1 2012-01-30 00:00:00

@hayd
Copy link
Contributor Author

hayd commented May 21, 2013

I'm thinking maybe this has something to do with dayfirst, perhaps it should default to None and we should check it (or is this an external, I'm sure I have looked into this/similar before). It seems to interfere here:

In [21]: pd.to_datetime(s[0], format='%d/%m/%Y')
Out[21]: datetime.datetime(2012, 12, 1, 0, 0)

In [22]: pd.to_datetime(s[0], format='%d/%m/%Y', dayfirst=True)
Out[22]: datetime.datetime(2012, 1, 12, 0, 0)

Not cool.

@jorisvandenbossche
Copy link
Member

As you see in the example you gave, there is also a difference between a string and a series:

>>> pd.to_datetime(s, format='%d/%m/%Y')
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]
>>> pd.to_datetime(s[0], format='%d/%m/%Y')
datetime.datetime(2012, 12, 1, 0, 0)

I looked in the code, and I can overlook something totally, but it seems that the format argument is not used when dealing with a string: https://github.com/pydata/pandas/blob/master/pandas/tseries/tools.py#L135, which could explain the behaviour when adding dayfirst.

@jorisvandenbossche
Copy link
Member

Could that also be the reason that the s.apply(pd.to_datetime, format='%d/%m/%Y') from the original question does not work?

  • apply on Series -> individual values of series feeded to function -> strings -> format not used (dateutil.parse)
  • apply on DataFrame (column) -> Series feeded to function -> format is used (tslib.array_strptime(arg, format))

@jorisvandenbossche
Copy link
Member

This can be closed I think. Solved by #3890

@hayd
Copy link
Contributor Author

hayd commented Jul 6, 2013

@jorisvandenbossche no, don't think so, master still shows same behaviour as first post.

@hayd
Copy link
Contributor Author

hayd commented Jul 6, 2013

@jorisvandenbossche I'm talking nonsense! You're right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

2 participants